Select to view content in your preferred language

Kriging: Choice of 'Maximum Distance'

248
2
09-01-2024 11:06 AM
MasoodShaikh
Emerging Contributor

Using point data, I ran Moran’s I with a distance band of 100,000 meters to ensure that each point had at least one neighbor and obtained statistically significant result. Now, when trying to run Kriging, I am unsure whether to use the ‘fixed’ or ‘variable’ option for the ‘Search Radius Settings.’ I am particularly concerned about what the ‘Maximum Distance’ should be, given that I used a distance band of 100,000 meters for Moran's I. Secondly, would it be appropriate to use the default of 12 for the 'Number of Points' in the 'Search Radius Settings' for Kriging?

Thanks

0 Kudos
2 Replies
DanPatterson
MVP Esteemed Contributor

Kriging in Geostatistical Analyst—ArcGIS Pro | Documentation

discusses and provides links to the various kriging options, have you chosen the appropriate method based on your data and underlying assumptions?


... sort of retired...
0 Kudos
MarcoBoeringa
MVP Regular Contributor

If I remember it well, the "Number of Points" setting determines the minimum number of data points to use to calculate a grid cell value, and the "Maximum Distance" setting can limit this:

- If you set a 'fixed' distance for the search radius, and only 3 points are found within this distance, the cell's value will be based solely on those 3 points, not the set "Number of Points", e.g. 12 as default.

- If you set a 'variable' distance, the search for nearest data points will continue until "Number of Points", e.g. 12 is found, irrespective of the distance.

So you have to answer the questions:

- Do I mind potentially adding data points beyond the set search radius? You likely shouldn't worry to much about data points beyond the search readius. Since Kriging is in a sense a form of IDW (Inverse Distance Weighted) interpolation, points further away will influence the grid cell's value less anyway, and shouldn't overly contribute or distort results even if less appropriate (unless some major break in data values is visible due to e.g. geological factors, in which case you might wish to set barries).

- Do I care if less data points are being included to calculate a cell's value? If the data is erratic (which already means it is less suitable for interpolation), having more datapoints included might be better to get the overall picture.

Either way, I think the differences between the options will be limited. Try it out to find out and explore the error surface after interpolation.

A last question you always need to ask yourself: is my data suitable for interpolation in the first place?

Sometimes other statistical methods are better suited for certain types of data, and classification of data points and correlation with environmental factors based on ordinary statistics is the more appropriate method to extract value from your dataset. E.g. if you have statistically proven certain values correlate with certain geological strata, classifying a geological map based on this knowledge could also be a valid method of generating a space filling dataset, instead of interpolation.

Of course, with all the options for data exploration and post result evaluation in a tool like Geostatistical Analyst, you should be able to tell if your data is suitable for interpolation or not.

But sometimes people forget that data should in fact have spatial auto-correlation, and stubbornly ignore indications otherwise, in a desperate attempt to "create a surface" of a set of data points, because the data "must be interpolated!" (no, it doesn't always, and if you've sampled at the wrong spatial scale to capture the actual phenomenon your trying to get a handle on, spatial auto-correction may also be virtually absent).

If your data exploration says there is no real spatial auto-correlation, don't attempt to interpolate, find other ways to handle processing of your data, it may well still be suitable for some other type of statistical analysis with proper input of environmental factors.