Just like the name suggests, the technique generates synthetic data for the minority class.
SMOTE proceeds by joining the points of the minority class with line segments and then places artificial points on these lines.
Under the hood, the SMOTE algorithm works in 4 simple steps:
1. Choose a minority class input vector
2. Find its k nearest neighbors (k_neighbors is specified as an argument in the SMOTE() function)
3. Choose one of these neighbors and place a synthetic point anywhere on the line joining the point under consideration and its chosen neighbor
4. Repeat the steps until data is balanced
SMOTE is implemented in Python using the imblearn library.
references:
https://medium.com/analytics-vidhya/balance-your-data-using-smote-98e4d79fcddb
No comments:
Post a Comment