Sunday, October 2, 2022

AI/ML ways to avoid overfitting in Decision trees

 If the decision tree is allowed to train to its full strength, the model will overfit the training data. There are various techniques to prevent the decision tree model from overfitting.

Unlike other regression models, decision tree doesn’t use regularization to fight against over-fitting. Instead, it employs tree pruning. Selecting the right hyper-parameters (tree depth and leaf size) also requires experimentation, e.g. doing cross-validation with a hyper-parameter matrix. 

Pruning

* Pre-pruning

* Post-pruning

Ensemble

* Random Forest

By default, the decision tree model is allowed to grow to its full depth. Pruning refers to a technique to remove the parts of the decision tree to prevent growing to its full depth. By tuning the hyperparameters of the decision tree model one can prune the trees and prevent them from overfitting.

There are two types of pruning Pre-pruning and Post-pruning

Pre-Pruning:

The pre-pruning technique refers to the early stopping of the growth of the decision tree. The pre-pruning technique involves tuning the hyperparameters of the decision tree model prior to the training pipeline. The hyperparameters of the decision tree including max_depth, min_samples_leaf, min_samples_split can be tuned to early stop the growth of the tree and prevent the model from overfitting.


Post-Pruning:

The Post-pruning technique allows the decision tree model to grow to its full depth, then removes the tree branches to prevent the model from overfitting. Cost complexity pruning (ccp) is one type of post-pruning technique. In case of cost complexity pruning, the ccp_alpha can be tuned to get the best fit model.


Ensemble — Random Forest:

Random Forest is an ensemble technique for classification and regression by bootstrapping multiple decision trees. Random Forest follows bootstrap sampling and aggregation techniques to prevent overfitting.

references:

https://towardsdatascience.com/3-techniques-to-avoid-overfitting-of-decision-trees-1e7d3d985a09#:~:text=Pruning%20refers%20to%20a%20technique,%2Dpruning%20and%20Post%2Dpruning.

No comments:

Post a Comment