Still on EBK/EBK Regression Prediction

412
7
Jump to solution
11-12-2021 04:28 PM
Elijah
by
Occasional Contributor II

I am trying to understand these descriptions below of EBK/EBKRP. In particular I need to understand what is meant by 'small data sets' here. Does it give any idea of how well EBK performs under different sampling densities, etc?

........EBKRP manages to achieve better accuracy than other kriging techniques both for small datasets and even when data is locally moderately non-stationary (Krivoruchko, 2012). 

........EBKRP have shown that the prediction intervals obtained by EBK have good coverage probabilities when the data variation is changing rapidly and dissimilarly in different parts of the data extent (Gribov and Krivoruchko 2020).

0 Kudos
2 Solutions

Accepted Solutions
EricKrause
Esri Regular Contributor

Hi @Elijah,

"Small" is of course relative, but this generally means datasets between ~20 and 100 input points.  Most other kriging methods estimate a single semivariogram for the entire study area, and this presents two problems. 

First, it is difficult to estimate semivariogram for small datasets, and the uncertainty in the semivariogram model will not be propagated to the prediction standard errors.  Second, one semivariogram (even if correctly estimated) may not fit well everywhere in the study area; some areas may need different semivariograms than other areas.

By simulating data and semivariograms locally, EBK/EBKRP better captures uncertainty in the semivariogram model, and it allows the models to change within the study area.  This results in, among other things, better "coverage probabilities" for EBK.  For example, when creating 95% confidence intervals for kriging (not EBK) predictions, it is common for only 75% of validation data to fall within the confidence interval because the standard errors do not accurately reflect the prediction uncertainties.  But you should generally expect confidence intervals from EBK to be much closer to 95%.  

-Eric

View solution in original post

EricKrause
Esri Regular Contributor

Hi Elijah,

I'm not aware if this has been specifically researched by anyone, but I have a few thoughts about EBK with low and varying sampling density. 

For the same subset size, the area of a subset will be larger for a low-density area than a high-density area (ie, 100 points would be spread out over a larger area for lower densities).  EBK assumes stationarity within each subset, so to get best results, you should configure the subset size so that the resulting subsets are small enough in area to assume each is stationarity.  

EBK (when a transformation is used) and EBKRP are based on Simple Kriging, so they assume a uniform density within each subset.  It's ok if the density of the samples varies globally (ie, some areas are densely sampled, and others aren't), but it's important that the density of the points be relatively consistent within each subset.  Again, this may require configuring the subset size. 

EBK Regression Prediction, however, has a clear advantage for low density sampling, but it's the regression part, not the EBK part.  The better the regression model predicts the dependent variable on its own, the less need there is for including autocorrelation from neighbors with EBK.

Hope that's all clear,

Eric

View solution in original post

7 Replies
DanPatterson
MVP Esteemed Contributor
0 Kudos
Elijah
by
Occasional Contributor II

Hi Dan,

Thanks for your reply. 

Yes, I mean the last 2 bullet points contained in the 'Advantages of Empirical Bayesian Kriging' as found here https://pro.arcgis.com/en/pro-app/latest/help/analysis/geostatistical-analyst/what-is-empirical-baye... . I am particularly interested in understanding what is referred to 'as small data set' and how EBK is more accurate than other kriging options when using 'small data sets'. How small is small, for example? Perhaps, looking at sampling density, etc?. It will be good to see (case) studies proving EBK's superior performance in a 'small data set' setting. I am just curious. I can't yet see such articles or demonstrations yet.

Thanks a lot

0 Kudos
DanPatterson
MVP Esteemed Contributor

"small" is a non-committal term since people would have different thresholds for the small/large boundary.

It would be difficult to find/derive datasets with spatial patterns that are similar at a variety of spatial extents from which you could derive spatial samples at different extents.

You have data, of unspecified condition (extent, number of rows and columns etc) and you have your results.  I doubt you will be able to see that your obtained results were superior/inferior because they fell into the small/large category.

If you are curious, you would have to generate controlled spatial patterns from which you could vary the spatial sample size (extent, etc).  There are articles on generating spatial patterns


... sort of retired...
0 Kudos
EricKrause
Esri Regular Contributor

Hi @Elijah,

"Small" is of course relative, but this generally means datasets between ~20 and 100 input points.  Most other kriging methods estimate a single semivariogram for the entire study area, and this presents two problems. 

First, it is difficult to estimate semivariogram for small datasets, and the uncertainty in the semivariogram model will not be propagated to the prediction standard errors.  Second, one semivariogram (even if correctly estimated) may not fit well everywhere in the study area; some areas may need different semivariograms than other areas.

By simulating data and semivariograms locally, EBK/EBKRP better captures uncertainty in the semivariogram model, and it allows the models to change within the study area.  This results in, among other things, better "coverage probabilities" for EBK.  For example, when creating 95% confidence intervals for kriging (not EBK) predictions, it is common for only 75% of validation data to fall within the confidence interval because the standard errors do not accurately reflect the prediction uncertainties.  But you should generally expect confidence intervals from EBK to be much closer to 95%.  

-Eric

Elijah
by
Occasional Contributor II

Dear Eric,

Again, I quite appreciate your answers. I need a few more insight if possible.

You have well-dealt with possible number of samples implied by 'small'.

However, knowing that the number of samples is not linearly related to sampling density, is there anything specifically advantageous about using EBKRP, for instance, in low sampling density scenarios? Generally, interpolation accuracies are affected by the sampling density and distribution. I will like to know if, perhaps by means of estimating the semi-variogram through the process of sub-setting and repeated simulations, EBKRP can achieve better result than other kriging models, in low sampling density setting. The whole idea is to say or not, that, in a data-scarce (sparse) / low density sampling situation setting (not necessarily small number of samples), EBKRP will perform better than other kriging models, based on its intrinsic characteristics.

 

Thanks in advance.

0 Kudos
EricKrause
Esri Regular Contributor

Hi Elijah,

I'm not aware if this has been specifically researched by anyone, but I have a few thoughts about EBK with low and varying sampling density. 

For the same subset size, the area of a subset will be larger for a low-density area than a high-density area (ie, 100 points would be spread out over a larger area for lower densities).  EBK assumes stationarity within each subset, so to get best results, you should configure the subset size so that the resulting subsets are small enough in area to assume each is stationarity.  

EBK (when a transformation is used) and EBKRP are based on Simple Kriging, so they assume a uniform density within each subset.  It's ok if the density of the samples varies globally (ie, some areas are densely sampled, and others aren't), but it's important that the density of the points be relatively consistent within each subset.  Again, this may require configuring the subset size. 

EBK Regression Prediction, however, has a clear advantage for low density sampling, but it's the regression part, not the EBK part.  The better the regression model predicts the dependent variable on its own, the less need there is for including autocorrelation from neighbors with EBK.

Hope that's all clear,

Eric

Elijah
by
Occasional Contributor II

Hi Eric,

Your answer is very much appreciated. Many thanks.

0 Kudos