I am using 5 rasters in my model to predict a soil property point data. I am wondering what it means when you have a low variance explained 22% (model bag of errors), however in my training data regresssion diagnostics, the R2 is 89% and the SMSE and SE are also low. Also, when I look at my residuals the model appears to have predicted very well.
Hi, it sounds like your model might be overfitting to the training data. So it looks like it's doing a really good job on the data the model was trained on, but then the model won't perform well when predicting to new data.
There are a couple of ways to avoid this. Start by looking at Validation Options accordion in the forest-based and boosted classification and regression tool, and make sure there is some data set aside for evaluation (Training data excluded for validation %). Then, in the output, you can evaluate your R^2, errors, etc. on both the training and the validation data. If the metrics are much better for training than for testing, your model is overfitting.
There is a checkbox in the tool to Optimize Parameters. This will choose the parameters (such as tree depth, etc.) that gives you the highest, say, R^2 specifically for your testing data. See more here: https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/how-forest-works.htm#:~:t...
--Catherine McSorley
Hi Catherine,
Thanks for helping out.
My next questions are:
Nb
Arnie Waddell, M.A.
GIS Specialist
2nd Floor 303 Main St.
Winnipeg, Manitoba
Agriculture and Agri-Food Canada / Government of Canada
arnie.waddell@agr.gc.ca / Tel 431-275-4867
From: Esri Community <esricommunity@esri.com>
Sent: Thursday, November 14, 2024 6:33 PM
To: arnie.waddell@canada.ca
Subject: Re: Forest Based Classification and Regression (Subscription Update)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
ATTENTION: Ce courriel provient de l’extérieur de l’organisation. Ne cliquez pas sur les liens et n’ouvrez pas les pièces jointes à moins que vous ne reconnaissiez l’expéditeur et que vous sachiez que le contenu est sûr.
*** DO NOT REPLY TO THIS E-MAIL ***
To respond, use the hyperlinked Response Options at the bottom of your notification below OR visit the Esri Community post directly and reply from there.
Hi ArnieWaddell1,
CatherineMcSorley (Esri Contributor) posted a new reply in Spatial Statistics Questions on 11-14-2024 04:33 PM:
Hi, it sounds like your model might be overfitting to the training data. So it looks like it's doing a really good job on the data the model was trained on, but then the model won't perform well when predicting to new data.
There are a couple of ways to avoid this. Start by looking at Validation Options accordion in the forest-based and boosted classification and regression tool, and make sure there is some data set aside for evaluation (Training data excluded for validation %). Then, in the output, you can evaluate your R^2, errors, etc. on both the training and the validation data. If the metrics are much better for training than for testing, your model is overfitting.
There is a checkbox in the tool to Optimize Parameters. This will choose the parameters (such as tree depth, etc.) that gives you the highest, say, R^2 specifically for your testing data. See more here: https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/how-forest-works.htm#:~:t...
--Catherine McSorley
Esri Community sent this message to arnie.waddell@canada.ca.
You are receiving this email because a new message matches your subscription to a topic.
If you do not want to receive notification for this message, unsubscribe the topic or mute the message.
To manage your email notifications, go to your settings in the community.