I am working on a dataset using empirical bayesian kriging . EDA suggests using Log Empirical trasformation and a K-Bessel detrended semivariogram type. Without further adjustment, I achieved the "best" results (concerning error statistics) when using a smooth circular neighborhood type.
I am now trying to give my results more significance, especially concerning the spatial range of the prediction, by using a appropriate search radius.
The only quote concerning this issue I could find so far is in: Sumner, M. E. (1999): Handbook of soil science, and says: "the search radius should not exceed the range of the semivariogram, and is typically less than ½ the range”.
Is this a good starting point or are there other suggestions?
Solved! Go to Solution.
There aren't a lot of criteria of what is a good search radius, but there is plenty about what is a bad search radius. It definitely should not be larger than the range of the semivariogram (as you noted), and it should be large enough that it captures at least 10 points everywhere in the data domain. Other than that, there aren't many recommendations other than comparing validation and crossvalidation statistics for different search radii.
In the Geostatistical Wizard, the default search radius for smooth interpolation is calculated such that it attempts to use at least 32 neighbors in each location. It isn't a perfect algorithm, but we have found it to be reliable and robust.
If you are working with similar soils data, it would be appropriate. Are there any other suggestions in the help files?
Unfortunately, I could not find any useful tips in the helpfiles.
I am not working specifically on soil data, but on clastic sediments and associated parameters like tranmissivity. So, the quote from the Handbook of soil science was the closest I could get.
Nevertheless, any suggestions would be appreciated.
There aren't a lot of criteria of what is a good search radius, but there is plenty about what is a bad search radius. It definitely should not be larger than the range of the semivariogram (as you noted), and it should be large enough that it captures at least 10 points everywhere in the data domain. Other than that, there aren't many recommendations other than comparing validation and crossvalidation statistics for different search radii.
In the Geostatistical Wizard, the default search radius for smooth interpolation is calculated such that it attempts to use at least 32 neighbors in each location. It isn't a perfect algorithm, but we have found it to be reliable and robust.
Once again, thank you very much for the insight!
I should also say that it depends on how fast you need the method to process. In general, it is better to use a search radius that is too big than one that is too small. If it is too small, you are missing relevant information in the neighboring points, and the quality of your predictions will decline. If it is too large, you are pulling in information that is not useful, but these non-informative neighbors tend to get very small weights, so they have little impact on the quality of the interpolation. However, the more neighbors you use, the longer the method takes to calculate.
If you aren't concerned with processing speed, you should err on the side of a larger radius rather than a small one. That being said, the default of ~32 neighbors is almost always more than enough, and you generally won't see improvement in predictions by adding more neighbors. But, as always, it depends on your data.
I appreciate the additional information!
I have now taken my knowledge of the sedimentary conditions and their spatial distribution into account and ended up with a search radius that is about half the average range and incorporates between 25 and 50 sample points, depending on the location of prediction. So I guess I was on the right track.
One last question if you don't mind: Is it safe to assume similar numbers (~32 neighbors) to be sufficient and robust for a standard circular search neighborhood as well, or are there any major differences?
Generally, yes. Though a standard search neighborhood has more parameters to it. First, it can use sectors, which force particular numbers of neighbors to come from different directions. Second, it has minimum and maximum number of neighbors (per sector) as a parameter.
The way the algorithm works is:
So, for a standard search neighborhood, you can get similar behavior by setting the minimum number of neighbors equal to the maximum number of neighbors (no matter what your search radius is). Remember to take the number of sectors into account. For example, if you have four sectors, and you set min=max=8, then each location will use exactly 32 neighbors in the calculation (8 from each of the four sectors). That is, assuming there actually are at least 8 neighbors in each of the sectors.