Comments on Neighborhood Size

858
6
05-01-2020 03:20 AM
BankimYadav
New Contributor III

As I have read about some threads in Geostatistical Analyst that there are no definitive rules about choosing the right neighborhood size but the bad ones are like choosing the radius more than half of the range of semivariogram. 

So, I have decided the parameters of first part i.e. Semivariogram modelling:

  1. Lag size = SET (using 'Avg Neighbourhood Distance' tool from Spatial Analyst, and following the rule that lag size * no. of lags = 0.5 * (max distance between input points)) 
  2. Lag no. = SET (as in point 1)
  3. Anisotropy = ON  (as I can see that the sill is approached rapidly in one direction than the others) 
  4. Model = 'Stable' (subject to change) 
  5. Nugget = 'OFF' (due to poor cross-validation stats with Nugget = 'ON') 

So, now in the second part i.e. Prediction, I have set some parameters with confidence but have confusion in some:

  1. Number of neighbors belongs to range [10, 32]

The problem: If I set the 'copy parameters from semivariogram' to 'TRUE', the ellipse almost encompasses the study region extent which I believe is a bit weird. But, if I set its flag to 'FALSE' and set the major and minor axis myself, would that nullify the effect of semivariogram modelling? If yes or not, please provide me some recommendations about the neighborhood size as also seen in the images below.  Eric Krause‌ 

Image 1 - Semivariogram modelling parameters, Image 2 - the big search radius.  

Tags (1)
0 Kudos
6 Replies
RobertBorchert
Frequent Contributor III

There is no set rule because a Neighborhood is defined by Cultural Geography and not Physical Geography

BankimYadav
New Contributor III

Thank you Mr. Borchert. 

0 Kudos
EricKrause
Esri Regular Contributor

Using different parameters for the neighborhood and the semivariogram will not nullify the effect of the semivariogram.

The semivariogram has one set of parameters, and the search neighborhood has a different set.  The logic is that the semivariogram defines correlation based on distance, and it allows you to compute optimal weights for any set of neighbors.  Once you define the neighborhood, the semivariogram is applied to the neighborhood to produce the predictions. 

It usually makes most sense to match the parameters of the neighborhood to the parameters of the semivariogram, but there's no requirement to do this.  The reasoning behind keeping the parameters the same is that, for example, the Range in the semivariogram defines the maximum distance where points are still spatially correlated.  Points further apart than this distance are considered spatially uncorrelated.  When building the neighborhood, then, it makes sense to only use neighbors that are closer than the Range so that you are only including neighbors with meaningful spatial correlation in the neighborhood.

BankimYadav
New Contributor III

Thank you for the insights Mr. Krause. So, the labor done in modelling the semivariogram is not lost if the neighbourhood parameters are not adopted in the next step. How the semivariogram properties are still transferred to the neighbourhood search process while even using a different radii, are implicit in your comment. I believe the explanation might be lengthy or beyond the scope of this discussion. 

So, I set the 'copy parameters' to 'FALSE' and checked the cross-validation stats with a much smaller radii. The stats were similar to the ones I was getting with this huge sized radii, from which I conclude that the big radius is doing no harm. As a curiosity, are such big radii used in the general interpolation jobs? 

Thank you.  

0 Kudos
EricKrause
Esri Regular Contributor

Using a large radius will rarely make the results worse.  In fact, if you were to look at the kriging equations in a textbook, you might not see any neighborhood mentioned at all.  In theory, the semivariogram defines the weights for every feature in the dataset.  It's just that these weights tend to be very close to 0 when the points are further than the the semivariogram range.  Including them or not including them will make almost no difference in the results.  Instead of accuracy, the real purpose of neighborhoods is for calculation speed.  By using neighborhoods, you get results in seconds, when it might take hours to calculate weights for the entire dataset, and the resulting surface would be nearly identical.

BankimYadav
New Contributor III

So, I believe I should use a smaller neighbourhood to speed up the process, particularly if I am getting no difference in the results. Please correct the statement if required. 

Thank you for the great response.

0 Kudos