Forest-Based regression with non-normally distributed data?

923
1
Jump to solution
05-13-2022 08:04 PM
JustinLee
New Contributor

Does anyone know how robust the Forest-Based regression is to non-normally distributed data? I've tried every data transformation possible and settled with a square root transformation which turned out to be the "least unsuccessful transformation. The subsequent model was still able to get a good R2 of 0.91, but I'm wondering how robust/sensitive the forest-based algorithm is.

0 Kudos
1 Solution

Accepted Solutions
EricKrause
Esri Regular Contributor

@JustinLee Great question!  Forest-based Classification and Regression does not make any normal distribution assumption about the data.  Generally speaking, outliers and extreme values will be most problematic for the model.  Ideally, you'll have a roughly even spread of values between the minimum and maximum, but there's no requirement that the distribution be bell-shaped.

View solution in original post

1 Reply
EricKrause
Esri Regular Contributor

@JustinLee Great question!  Forest-based Classification and Regression does not make any normal distribution assumption about the data.  Generally speaking, outliers and extreme values will be most problematic for the model.  Ideally, you'll have a roughly even spread of values between the minimum and maximum, but there's no requirement that the distribution be bell-shaped.