Saturday, May 24, 2025

What is Regularization and what is Dropout technique?

Regularization is a set of techniques used in machine learning to prevent overfitting.

Overfitting occurs when a model learns the training data too well, including the noise and irrelevant details, leading to poor performance on unseen data. A model that overfits has high variance. Regularization helps the model generalize better to new data by adding constraints or penalties to the model's complexity. This typically involves modifying the learning algorithm or the model architecture.

Various Regularization Techniques:

1. L1 Regularization (Lasso Regularization):

- Adds a penalty term to the loss function proportional to the absolute value of the weights.

 - Penalty = λ * Σ|w_i|

- λ is the regularization strength hyperparameter.

- Tends to push some weights to exactly zero, effectively performing feature selection

  by eliminating features that are less important.


2. L2 Regularization (Ridge Regularization):

- Adds a penalty term to the loss function proportional to the square of the magnitude of the weights.

- Penalty = λ * Σ(w_i)^2

-  Tends to shrink weights towards zero but rarely makes them exactly zero.

 - Discourages large weights, leading to a smoother decision boundary and reducing sensitivity to individual data points.


3. Dropout:

- A regularization technique specifically for neural networks.

- During training, randomly sets a fraction of neurons in a layer to zero for each training sample.

- The 'rate' (e.g., 0.5) is the probability of a neuron being dropped out.

- Forces the network to learn more robust features that are not reliant on the presence of  any single neuron. It can be seen as training an ensemble of sub-networks.


 Early Stopping:

- Monitors the model's performance on a validation set during training.

- Training is stopped when the performance on the validation set starts to degrade, even if the performance on the training set is still improving.

- Prevents the model from training for too long and overfitting the training data.


Data Augmentation:

- Creating new training data by applying transformations to the existing data

#      (e.g., rotating images, adding noise to text, scaling sensor readings).

- Increases the size and diversity of the training set, making the model more robust to variations in the input data.


Batch Normalization:

 A technique applied to the output of a layer's activation function (or before the activation).

- Normalizes the activations of each mini-batch by subtracting the batch mean and dividing by the batch standard deviation.

- Helps stabilize the learning process, allows for higher learning rates, and can act as a regularizer by adding a small amount of noise.

Why Dropout is one of them?

Dropout is considered a regularization technique because it directly addresses the problem of overfitting in neural networks by reducing the model's reliance on specific neurons and their correlations.

How Dropout Helps:

 **Prevents Co-adaptation:** Without dropout, neurons might co-adapt, meaning they become overly dependent on specific combinations of other neurons' activations. This can lead to a network that only works well for the exact patterns in the training data. Dropout breaks these dependencies by randomly switching off neurons, forcing remaining neurons to learn more independent and robust features. 


**Ensemble Effect:** Training with dropout can be seen as training an ensemble of many different smaller neural networks. Each time a different set of neurons is dropped out, a slightly different network is trained. At test time (when dropout is typically turned off), the predictions are effectively an average over the predictions of these different sub-networks, which generally leads to better generalization and reduced variance. 


**Reduces Sensitivity to Noise:** By forcing the network to learn features that are useful even when some inputs are missing (due to dropout), the model becomes less sensitive to noise in the training data. 


**Simplified Model (Effectively):** While the total number of parameters remains the same, at any given training step, a smaller, "thinned" network is being used. This effectively reduces the complexity of the model being trained at that moment, which can help prevent overfitting. 


No comments:

Post a Comment