Does anyone know how robust the Forest-Based regression is to non-normally distributed data? I've tried every data transformation possible and settled with a square root transformation which turned out to be the "least unsuccessful transformation. The subsequent model was still able to get a good R2 of 0.91, but I'm wondering how robust/sensitive the forest-based algorithm is.
Solved! Go to Solution.
@JustinLee Great question! Forest-based Classification and Regression does not make any normal distribution assumption about the data. Generally speaking, outliers and extreme values will be most problematic for the model. Ideally, you'll have a roughly even spread of values between the minimum and maximum, but there's no requirement that the distribution be bell-shaped.
@JustinLee Great question! Forest-based Classification and Regression does not make any normal distribution assumption about the data. Generally speaking, outliers and extreme values will be most problematic for the model. Ideally, you'll have a roughly even spread of values between the minimum and maximum, but there's no requirement that the distribution be bell-shaped.