A gradient descent function is used in back-propagation to find the best value to adjust the weights by. There are two common types of gradient descent: Gradient Descent, and Stochastic Gradient Descent.
Gradient descent is a function that determines the best adjustment value to change the weights by. Over each iteration, it determines the volume/amount the weights should be adjusted by, the further away from the best determined weight, the bigger the adjustment value will be. You can think of it as a ball rolling down a hill; the ball's velocity being the adjustment value, and the hill being the possible adjustment values. Essentially, you want the ball (adjustment value) to be closest to the bottom of the world (possible adjustment) as possible. The ball's velocity will increase until it reaches the bottom of the hill - the bottom of the hill is the best possible value.
Stochastic gradient descent is a more complicated version of the gradient descent function and it is used in a neural network that may have a false-best adjustment value, where regular gradient descent won't find the best value, but a value it think's is the best. This can be analogised as the ball rolling down two hills, the hills are different in height. It rolls down the first hill and reaches the bottom of the first hill, thinking that it's reached the best possible answer, but with stochastic gradient descent, it would know that the position it was in now was not the best position, but in reality, the bottom of the second hill.
in back-propagation you calculate the furthest right weight-matrix's gradient and then adjust the weights accordingly, then you move one layer to the left, L-1, (on the next weight-matrix) and repeat the step, so in other words you determine the gradient, adjust accordingly and then move the the left.
The gradient of L wrt layer l−1 is calculated using the gradient wrt layer l
No comments:
Post a Comment