Learning rate (λ) is one such hyper-parameter that defines the adjustment in the weights of our network with respect to the loss gradient descent. It determines how fast or slow we will move towards the optimal weights
The Gradient Descent Algorithm estimates the weights of the model in many iterations by minimizing a cost function at every step.
Here is the algorithm
Repeat until convergence {
Wj = Wj - λ θF(Wj)/θWj
}
Where:
Wj is the weight
θ is the theta
F(Wj) is the cost function respectively.
In order for Gradient Descent to work, we must set the learning rate to an appropriate value. This parameter determines how fast or slow we will move towards the optimal weights. If the learning rate is very large we will skip the optimal solution. If it is too small we will need too many iterations to converge to the best values. So using a good learning rate is crucial.
In simple language, we can define learning rate as how quickly our network abandons the concepts it has learned up until now for new ones.
References:
https://towardsdatascience.com/https-medium-com-dashingaditya-rakhecha-understanding-learning-rate-dd5da26bb6de
No comments:
Post a Comment