Select to view content in your preferred language

Forest-based classification: compensate for sparse categories same as fixing imbalanced datasets?

278
0
05-03-2024 01:18 PM
PatriciaCarbajales-Dale
Regular Contributor

I have an imbalanced dataset, where the variable to predict is imbalanced (83% of features are non-events compared to 16% of the events). If we were to run forest-based classification in python, for example, we would run a combination of oversampling the events and under-sampling the non-events.

The Compensate for Sparse Categories seems like it would do this for explanatory training variables. But, when checking the box, does it apply as well for the variable to predict? In other words, can checking this box be similar to using the SMOTE technique in Python (Synthetic Minority Oversampling Technique) or similar?

0 Replies