Areal Interpolation and data normaility.

JonathanWaddell · ‎12-20-2013

Hi,

My question is in regards to using the areal interpolation and data normality.

I am currently working on an M.Sc. thesis with the purpose to describe agricultural soils in my region. To do so, a 20 ha grid was superimposed over the region and for each cell a composite soil samples consisting of 20 soil cores was taken in a W pattern over the entire support. Therefore, since I've taken composite sample my data for every cell (I've used each cell has a separate polygon) can be considered an average for that area.

I would like to clarify the ArcGIS help file for the areal interpolation. It stats that "Areal interpolation for continuous data requires that the data is Gaussian and averaged over defined polygons".

My question is now, does the data need to be Gaussian within each individual polygon or does the overall dataset for the variable need to be normally distributed?

I would imagine due to the nature of my data, if it's only required within the polygon I'll be assume normality and go on with life.

Another question would be how important is normality if its required or suggested for the entire dataset of the variable? The purpose of these maps is just to create a density surface. I will not re-aggregate my data into new polygons.

I would like these maps accompany statistical comparisons of contrasting land uses and soil types in the region. So in other words, the purpose of these maps is to create an easily visually interpretable surface of individual variables for the myself and the reader to better understand their spatial distribution.

Jonathan

EricKrause · ‎12-23-2013

The assumption is that there is an underlying smooth surface for the phenomenon. The surface is assumed to be a second-order stationary Gaussian field. The values in the polygons are assumed to be the average of the underlying Gaussian surface within that polygon. Areal interpolation is the process of estimating the underlying Gaussian field from the averages in an arbitrary set of polygons.

Regarding how important the Gaussian assumption is, this is difficult to answer. Kriging theory assumes a Gaussian distribution in order for the predictions to be the "best linear unbiased predictions." If the data is not Gaussian, this property no longer holds. Kriging is known to be fairly robust to non-Gaussian distributions, but none of its attractive statistical properties hold if the data is not Gaussian.

I'll actually change that documentation. Saying areal interpolation "requires" a Gaussian distribution is too strong of a statement. Instead, it should say that the kriging equations assume a Gaussian distribution.

NicholasNagle · ‎07-17-2014

Kriging maintains nice properties even without normality. In particular, kriging is the Best Linear Unbiased Predictor, regardless of the data distribution (where Best is defined in the Mean Square Error sense). If the data are Gaussian, then kriging is stronger, it is the Best Unbiased Predictor. In non-gaussian cases, there may unbiased, but nonlinear predictors that perform much better than the linear kriging estimator. In Gaussian AND non-Gaussian cases, there may be biased estimators with better mean square error than kriging (for example, Bayesian with informative priors, and other regularized estimators, including ridge regression, factorial kriging, and James-Stein estimators). How can a biased estimator have better mean square error? By being more stable (having less variability).