Sunday, October 2, 2022

AI/ML Neural Network: The Dead Neuron

choosing an activation function for the hidden layer is not an easy task. The configuration of the hidden layer is an extremely active topic of research, and it just doesn’t have any theory about how many neurons, how many layers, and what activation function to use given a dataset. Back then, sigmoid is the most popular activation function due to its non-linearity. As time goes by, a neural network advanced to a deeper network architecture that raised the vanishing gradient problem. Rectified linear unit (ReLU) turns out to be the default option for the hidden layer’s activation function since it shuts down the vanishing gradient problem by having a bigger gradient than sigmoid.


The drawback of ReLU is that they cannot learn on examples for which their activation is zero. It usually happens if you initialize the entire neural network with zero and place ReLU on the hidden layers. Another cause is when a large gradient flows through, a ReLU neuron will update its weight and might be ended up with a big negative weight and bias. If this happens, this neuron will always produce 0 during the forward propagation, and then the gradient flowing through this neuron will forever be zero irrespective of the input.


In other words, the weights of this neuron will never be updated again. Such a neuron can be considered as a dead neuron, which is considered a kind of permanent “brain damage” in biological terms. A dead neuron can be thought of as a natural Dropout. But the problem is if every neuron in a specific hidden layer is dead, it cuts the gradient to the previous layer resulting in zero gradients to the layers behind it. It can be fixed by using smaller learning rates so that the big gradient doesn’t set a big negative weight and bias in a ReLU neuron. Another fix is to use the Leaky ReLU

references:

https://towardsdatascience.com/neural-network-the-dead-neuron-eaa92e575748


No comments:

Post a Comment