When it comes to training models, there are two major problems one can encounter: overfitting and underfitting.
Overfitting happens when the model performs well on the training set but not so well on unseen (test) data.
Underfitting happens when it neither performs well on the train set nor on the test set.
Regularization is implemented to avoid overfitting of the data, especially when there is a large variance between train and test set performances. With regularization, the number of features used in training is kept constant, yet the magnitude of the coefficients (m) as seen in below equation is reduced
Equation is
y(hat) = m1 * x1 + m2 * x2 + ... + mn * xn + b
There are different ways of reducing model complexity and preventing overfitting in linear models. This includes ridge and lasso regression models.
references:
https://www.datacamp.com/tutorial/tutorial-lasso-ridge-regression#data%20importation%20and%20eda
No comments:
Post a Comment