<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to perform Validation? in ArcGIS GeoStatistical Analyst Questions</title>
    <link>https://community.esri.com/t5/arcgis-geostatistical-analyst-questions/how-to-perform-validation/m-p/84318#M215</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;The regression line in cross validation excludes some of the extreme values when fitting the line.&amp;nbsp; This is why the line differs from Excel.&amp;nbsp; Sorry that I had forgotten to mention this earlier.&amp;nbsp; From the help:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;"&lt;SPAN style="color: #4d4d4d; background-color: #fefefe;"&gt;This procedure first fits a standard linear regression line to the scatterplot. Next, any points that are more than two standard deviations above or below the regression line are removed, and a new regression equation is calculated. This procedure ensures that a few outliers will not corrupt the entire regression equation."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #4d4d4d; background-color: #fefefe;"&gt;Regarding whether to do validation or cross validation, you have a few choices.&amp;nbsp; Validation is the most statistically defensible methodology (because it validates against data that was completely withheld), but it requires not using some of your data.&amp;nbsp; Cross validation, on the other hand, uses all data to build the model, but it then validates against the same data used to build the model, so there is a bit of data double-dipping.&amp;nbsp; The double-dipping isn't usually a problem because the influence of any individual point should not be too extreme.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #4d4d4d; background-color: #fefefe;"&gt;The third option is to do a validation workflow to decide the parameters of the model.&amp;nbsp; You can then apply this model to the entire data.&amp;nbsp; To do this, perform the entire validation workflow.&amp;nbsp; Then use the Create Geostatistical Layer tool, and provide the geostatistical layer used for validation and the entire dataset.&amp;nbsp; This will apply the parameters of the validation model to all data.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #4d4d4d; background-color: #fefefe;"&gt;If, as you say, you're going to choose your model by cross validation statistics, then I would probably just do cross validation and not do a full validation workflow.&amp;nbsp; But it's up to you.&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Wed, 29 Apr 2020 13:33:11 GMT</pubDate>
    <dc:creator>EricKrause</dc:creator>
    <dc:date>2020-04-29T13:33:11Z</dc:date>
    <item>
      <title>How to perform Validation?</title>
      <link>https://community.esri.com/t5/arcgis-geostatistical-analyst-questions/how-to-perform-validation/m-p/84315#M212</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;STRONG&gt;Okay I got through the step one&lt;/STRONG&gt; i.e. 'Subset Features' in the GA tool box.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The cross-validation documentation says &lt;EM&gt;that the validation and cross-validation diagnostics are similar except that the input models are over the entire dataset and training portion, respectively.&amp;nbsp;&lt;/EM&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;How to prepare a new model on the training portion only. How to obtain the similar cross validation graphs&lt;/STRONG&gt;. Right now, i just have two output point features classes generated from my point input dataset using 'Subset Features'.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.esri.com/migrated-users/11953"&gt;Emily Norton&lt;/A&gt;&amp;nbsp;‌&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 27 Apr 2020 07:59:00 GMT</pubDate>
      <guid>https://community.esri.com/t5/arcgis-geostatistical-analyst-questions/how-to-perform-validation/m-p/84315#M212</guid>
      <dc:creator>BankimYadav</dc:creator>
      <dc:date>2020-04-27T07:59:00Z</dc:date>
    </item>
    <item>
      <title>Re: How to perform Validation?</title>
      <link>https://community.esri.com/t5/arcgis-geostatistical-analyst-questions/how-to-perform-validation/m-p/84316#M213</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Subset Features is used to split the data into "training" and "test" subsets.&amp;nbsp; You will build the interpolation model as normal using the training subset using whichever interpolation method and parameters you decide.&amp;nbsp; You'll then run GA Layer To Points tool and predict/validate to the test subset.&amp;nbsp; Specify the field with the measured values in the test subset, and run the tool.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The output will be a feature class with all of the usual validation statistics for each individual feature.&amp;nbsp; The Predicted and Error fields will always appear, but some models will also create Standard Error, Standardized Error, and Normal Value fields.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can then create Scatter Plot charts using these fields.&amp;nbsp; While they are not created automatically like they are for cross validation, they can all be created by simple scatter plots:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Predicted: Use the field of measured values and the Predicted field.&lt;/P&gt;&lt;P&gt;Error: Use the field of measured values and the Error field.&lt;/P&gt;&lt;P&gt;Standardized Error: Field of measured values and Standardized Error field.&lt;/P&gt;&lt;P&gt;Normal QQ Plot: Normal Value and Standardized Error fields.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The reason these do not appear in a pop-up window like the cross validation results is that this pop-up is a property of geostatistical layers.&amp;nbsp; Feature classes cannot display them.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 27 Apr 2020 14:55:34 GMT</pubDate>
      <guid>https://community.esri.com/t5/arcgis-geostatistical-analyst-questions/how-to-perform-validation/m-p/84316#M213</guid>
      <dc:creator>EricKrause</dc:creator>
      <dc:date>2020-04-27T14:55:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to perform Validation?</title>
      <link>https://community.esri.com/t5/arcgis-geostatistical-analyst-questions/how-to-perform-validation/m-p/84317#M214</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;&lt;SPAN&gt;&lt;EM&gt;Thank you Mr. Eric.&amp;nbsp;&lt;/EM&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;EM&gt;I did as you suggested and got what I wanted. I am happy with it. I have two points:&lt;/EM&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline; "&gt;Regarding the linear regression equation in cross-validation:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline; "&gt;&lt;SPAN style="text-decoration: none;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;The equation between measured vs predicted values is not the same as found through python or Excel. &lt;A href="https://1drv.ms/x/s!ApnFGtCbIkTPsWUphedI1VeIfuWX?e=P0p9nC"&gt;This&lt;/A&gt;&amp;nbsp;is the excel file containing columns of measured and predicted values as copied from the Cross-Validation (CV) Stats. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline; "&gt;Regarding cross-validation and validation: &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline; "&gt;&lt;SPAN style="text-decoration: none;"&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Since I am working on the entire data for finding the best model, ranked by CV stats, should I even do the validation part? – sub setting the data into training and test portions and finding prediction performance on test portion. As given in GA documentation, validation is like a preliminary step as if the model works good on training portion then a ‘similar’ model will work good on the entire dataset. &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Most probably, I would be using the same model on the training set. Would it be like just providing more stats about how it worked on train-test portions when I already know how it works on the entire dataset.&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Please provide your wonderful insights. &lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 29 Apr 2020 03:33:07 GMT</pubDate>
      <guid>https://community.esri.com/t5/arcgis-geostatistical-analyst-questions/how-to-perform-validation/m-p/84317#M214</guid>
      <dc:creator>BankimYadav</dc:creator>
      <dc:date>2020-04-29T03:33:07Z</dc:date>
    </item>
    <item>
      <title>Re: How to perform Validation?</title>
      <link>https://community.esri.com/t5/arcgis-geostatistical-analyst-questions/how-to-perform-validation/m-p/84318#M215</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;The regression line in cross validation excludes some of the extreme values when fitting the line.&amp;nbsp; This is why the line differs from Excel.&amp;nbsp; Sorry that I had forgotten to mention this earlier.&amp;nbsp; From the help:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;"&lt;SPAN style="color: #4d4d4d; background-color: #fefefe;"&gt;This procedure first fits a standard linear regression line to the scatterplot. Next, any points that are more than two standard deviations above or below the regression line are removed, and a new regression equation is calculated. This procedure ensures that a few outliers will not corrupt the entire regression equation."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #4d4d4d; background-color: #fefefe;"&gt;Regarding whether to do validation or cross validation, you have a few choices.&amp;nbsp; Validation is the most statistically defensible methodology (because it validates against data that was completely withheld), but it requires not using some of your data.&amp;nbsp; Cross validation, on the other hand, uses all data to build the model, but it then validates against the same data used to build the model, so there is a bit of data double-dipping.&amp;nbsp; The double-dipping isn't usually a problem because the influence of any individual point should not be too extreme.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #4d4d4d; background-color: #fefefe;"&gt;The third option is to do a validation workflow to decide the parameters of the model.&amp;nbsp; You can then apply this model to the entire data.&amp;nbsp; To do this, perform the entire validation workflow.&amp;nbsp; Then use the Create Geostatistical Layer tool, and provide the geostatistical layer used for validation and the entire dataset.&amp;nbsp; This will apply the parameters of the validation model to all data.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="color: #4d4d4d; background-color: #fefefe;"&gt;If, as you say, you're going to choose your model by cross validation statistics, then I would probably just do cross validation and not do a full validation workflow.&amp;nbsp; But it's up to you.&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Wed, 29 Apr 2020 13:33:11 GMT</pubDate>
      <guid>https://community.esri.com/t5/arcgis-geostatistical-analyst-questions/how-to-perform-validation/m-p/84318#M215</guid>
      <dc:creator>EricKrause</dc:creator>
      <dc:date>2020-04-29T13:33:11Z</dc:date>
    </item>
    <item>
      <title>Re: How to perform Validation?</title>
      <link>https://community.esri.com/t5/arcgis-geostatistical-analyst-questions/how-to-perform-validation/m-p/84319#M216</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Thank you. You have nice insights and vast experience of geostatistics.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The thirds option is interesting. Although, there is a bit of double-dipping in this method too over the repeated usage of training set of data. I will try to perform it and include in my writeup if I can theoretically defend its usage.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;I am closing the thread here. Its answered and thank you.&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 01 May 2020 09:48:51 GMT</pubDate>
      <guid>https://community.esri.com/t5/arcgis-geostatistical-analyst-questions/how-to-perform-validation/m-p/84319#M216</guid>
      <dc:creator>BankimYadav</dc:creator>
      <dc:date>2020-05-01T09:48:51Z</dc:date>
    </item>
  </channel>
</rss>

