Select to view content in your preferred language

Kriging Cross Validation Approaches

5853
8
05-09-2012 01:01 PM
StephenVitoria
Emerging Contributor
Hi

I perform ordinary kriging (or co-kriging) on a dataset and in the final cross validation step of the geostatistical wizard, I can see a table which shows actual vs predicted values for each point in the dataset.

When I finish the geostatistical wizard, I create a prediction surface which is added as a layer to my project. I can then right click on the prediction layer and select the "Validation/Prediction" option which opens up the "GA Layer to Points" tool. I select the input dataset and the field which I had originally performed kriging on. This creates a new shape file which I can add to my project.

When I examine the attribute table for the new layer that I have just created using the "Validation/Prediction" option, it also shows actual and predicted values for each point in the data set. However, the predicted values are substantially different from those that were shown in the final cross validation screen of the geostatistical wizard.

Can anyone tell me why they are different, and which cross validation table is correct? I don't understand 😞

Many thanks for all your help

Stephen Vitoria
0 Kudos
8 Replies
EricKrause
Esri Regular Contributor
The results are different because these two methods are doing slightly different things.

On the cross-validation page of the Geostatistical Wizard, you're seeing statistics calculated in the following way:
1.  Remove the first point in the dataset, then use the remaining (n-1) points to predict the value at the location of the point you removed.
2.  Repeat step 1 for all n points in the dataset, and calculate the statistics.

The idea is that if your model is good, you should be able to use (n-1) points to closely predict the value of the nth point.

When you do GA Layer to Points with a field to validate on, you will not be removing the points before the prediction.  In other words, when you predict at an input point location, you will use the measured value at that location to make the prediction.  Clearly, including the measured value will provide more accurate predictions than not using it (in fact, if you turn measurement error off, it will do a perfect prediction every time).

Validation with GA Layer to Points is designed to let you use an entirely different dataset for your validation (it doesn't really make sense to validate on the same dataset that you used to build the model).  It's a common practice to split your data in half and build the model on one half and validate (via GA Layer to Points) on the other half.  The idea here is that if your model is good, it should be able to accurately predict the values of the dataset that you didn't use to build the model.

Was that explanation clear?  I'm happy to clarify anything.
0 Kudos
StephenVitoria
Emerging Contributor
Dear Eric

Many thanks for all your help, and for the advice. Is it therefore true to say that when evaluating actual versus predicted differences (i.e. errors in prediction), that I should be using the "GA Layer to Points" values and not the values produced by the Geostatistical Wizard, because the "GA Layer to Points" values are based on the complete model and not the n-1 model?

Thanks for the advice regarding splitting the data in half and comparing the results from each half to each other. Do you happen to have any links that you could post (possibly to blogs, etc...) that explain this in more detail, or indeed to any articles that might help me improve kriging modelling (I'm a beginner).

Thanks again for such a quick reply

Stephen
Tags (1)
0 Kudos
EricKrause
Esri Regular Contributor
If you're interested in looking at residuals (the difference between the predicted value and the actual value), then use GA Layer to Points.  If you're interested in seeing if your kriging model actually fits the data, use cross-validation (or validation by splitting your data in half).

Getting good at geostatistical modeling requires study, practice, and often a bit of luck.  The best place to start is our help documentation.  Here's a quick list of topics to review:  Gaussian distributions, transformations, semivariograms, histograms, QQ plots, voronoi diagrams, stationarity, trend removal, spatial autocorrelation, searching neighborhoods, cross-validation, and output types (prediction, standard error, probability, quantile).

Needless to say, that is too much to cover in a blog or forum post.

If you read the help and decide you want more information, a colleague published a book last year on performing spatial statistics (and geostatistics) in ArcGIS:
http://esripress.esri.com/display/index.cfm?fuseaction=display&websiteID=194
0 Kudos
MarcoBoeringa
MVP Regular Contributor
Getting good at geostatistical modeling requires study, practice, and often a bit of luck.


X2. I think many people underestimate the need to delve in these kind of subjects, whether it's geostatistical or plain statistical modelling on numbers. There is a lot going wrong in practice, and unfortunately also many datasets not only misinterpreted, but often under-used because people don't know what to do with it to get interpretable results.

If you read the help and decide you want more information, a colleague published a book last year on performing spatial statistics (and geostatistics) in ArcGIS:
http://esripress.esri.com/display/index.cfm?fuseaction=display&websiteID=194


Don't forget this small but helpful introductory whitepaper by the same author:
http://www.esri.com/library/whitepapers/pdfs/intro-modeling.pdf
0 Kudos
EricScherff
Emerging Contributor
Hi All,

I am experiencing problems exporting cross validation results at the end of the Geostatistical Wizard, and would like to ask what I might be overlooking.  After interpolating the surface using Ordinary Kriging on a dataset with around 412,000 points, I would like to export the cross validation results as a file geodatabase feature class--even a personal/file geodatabase feature class would be OK (I have tried both).  The problem is that when I click on Export Result Table, navigate to the geodatabase, name the file, and then click Save nothing happens.  I have tried clicking Finish in the Wizard, but still get no feature class.  It does work if I export as a shapefile, but I would rather not go that route.  The other frustration is that I have used this function successfully on the same machine with similar datasets just a few months ago.  The only difference is that now I have installed SP4 (ArcGIS 10, Windows 7).  Please help!  Thanks.

Eric
0 Kudos
SteveLynch
Esri Regular Contributor
Eric

Does the green progress bar at the bottom of the dialog go all the way to the right when exporting the cross validation results ?

I tried to repro what you did but cannot repro the "no output" to fGDB or pGDB

Thanks
Steve
0 Kudos
EricScherff
Emerging Contributor
Thanks for your response, Steve.  I recall that when I was having this problem initially that the progress bar did one of two things 1) absolutely nothing, or 2) flashed across from left to right almost too fast to track (in < 1 second).  However, I decided to return to the GA Layers to be sure.  By using right click > Method Properties, I cycled through the Wizard to get the Cross Validation Results, and I was then able to export successfully to a file geodatabase!  I cannot think of anything I was doing wrong or differently before, but my best guess is that the machine or software either had a hiccup, or that it paused for so long that I decided it was not working.  I wish I knew what could cause this, but right now I am glad to be getting the results out. 

On a related topic, is there a way to export model statistics such as Prediction Error, etc. that would eliminate the need to go through ArcMap via the Cross Validation Comparison tool?  I am having difficulty with Cross Validation Comparison right now, such as an empty window when I open up two recent models (I have XV Results and Grids for both models, and so I am confident they should be available for comparison): [ATTACH=CONFIG]14505[/ATTACH].  Exporting these statistics would also be helpful because I would like to compare more than one model side-by-side rather than selecting only two at a time.  A final general question, does this type of inconsistent behavior (i.e. empty comparison windows, and failure to export feature classes reliably) point to any type of common cause or fix for either my ArcGIS installation or machine settings?  For example, the IT department has already tried a video card driver update.  Thanks again!

Eric
0 Kudos
SteveLynch
Esri Regular Contributor
Eric

Have you looked at the Crossvalidation geoprocessing tool? and also the GALayerToPoints gp tool.

As far as the blank screens go, I'm afraid I cannot help you. Please contact Esri Support.

You may also want to look at
http://blogs.esri.com/esri/arcgis/2012/05/07/dealing-with-extreme-values-in-kriging/

Steve
0 Kudos