Tuesday, August 9, 2022

Back propagation in simple terms

Back propagation is an algorithm used in machine learning that works by calculating the gradient of the loss function, which points us in the direction of the value that minimizes the loss function. It relies on the chain rule of calculus to calculate the gradient backward through the layers of a neural network. Using gradient descent, we can iteratively move closer to the minimum value by taking small steps in the direction given by the gradient.


During forward propagation, we use weights, biases, and nonlinear activation functions to calculate a prediction y hat from the input x that should match the expected output y as closely as possible (which is given together with the input data x). We use a cost function to quantify the difference between the expected output y and the calculated output y hat.

The goal of backpropagation is to adjust the weights and biases throughout the neural network based on the calculated cost so that the cost will be lower in the n
ext iteration. Ultimately, we want to find a minimum value for the cost function.





With calculus, we can calculate how much the value of one variable changes depending on the change in another variable. If we want to find out how a change in a variable x by the fraction dx affects a related variable y, we can use calculus to do that. The change dx in x would change y by dy.

In Calculus notation, we express this relationship as follows.


The first derivative of a function gives you the slope of that function at the evaluated coordinate. If you have functions with several variables, you can take partial derivatives with respect to every variable and stack them in a vector. This gives you a vector that contains the slopes with respect to every variable. Collectively the slopes point in the direction of the steepest ascent along the function. This vector is also known as the gradient of a function. Going in the direction of the negative gradient gives us the direction of the steepest descent. Going down the route of the steepest descent, we will eventually end up at a minimum value of the function.




No comments:

Post a Comment