How to assess the quality of kriging results?

OliverLeimer1 · ‎03-12-2019

Hi community,

I'm currently trying to find the best and most accurate Kriging-interpolation for my project: interpolation of a groundwater-surface in an unconfined aquifer.
My project-area is a city-district of Vienna, Austria. My aquifer is mostly homogenous, consisting of gravel and sand and shows a slight trend of decreasing groundwater-levels in ESE direction.
So far, so good.

I've now tried "ordinary kriging" with gaussian kernel function and applied different settings (number of sectors, size of lags, etc.).

My problem now (please correct me if I'm wrong):

For assessing the quality of my results, I focus on the following factors:

1. Root-mean-square (should be as small as possible)
2. Root-mean-square-standardized (should be as close to 1 as possible)
3. Mean standardized error (should be as small as possible)
4. Average standard error (should be as small as possible)

Right?

But which of these factors is most important?

I'm currently facing the problem, that I have cases, in which RMS and MSE are close to 0; RMSS) is close to 1, BUT average standard error is relatively high..... and cases where it is the other way round.

And another question:
Is Ordinary Kriging a good usable method for modeling groundwater-levels anyway?
I also thought about using external drift-Kriging perhaps, but as I'd probably have to use the surface-model as 2nd parameter, it doesn't seem to make sense (urban flat-lands, low natural differences in elevation, where human-caused changes in elevation are probably higher than natural changes).

EricKrause · ‎03-14-2019

If the problem is picking between various models that all look good, that's a pretty good problem to have. As you said, probably just go with the one with the smallest RMS, but you should definitely note that different models with slightly varying parameters gave nearly identical results. That is a very good thing, as it means that your predicted values are robust.

There actually is a tool called Semivariogram Sensitivity that does a very similar workflow, it may save you some time. You give a kriging model as input, then give some tolerances to the semivariogram parameters, and it tries random combinations and produces predicted values and standard errors at a new set of locations. The idea is that if the predicted values and standard errors don't change much for different parameter combinations, this means that the somewhat arbitrary choice of parameters didn't have a huge impact.

[I've edited this post. It previously had some incorrect information.]

View solution in original post

JimCousins · ‎03-12-2019

Preface: I am by no means an expert in this process.

The first step is to post your point data and evaluate if you have constraints along the edges. If you do not, your plume will be pushed outward larger than is realistic because it will use the mean beyond your data. Your options in this case are to use quantile kriging, logarithmic (this one typically creates a plume smaller than the actual, but you work with the data you have), or to put in constraining data points (control points) based on your knowledge of the site, transmissivity, hydraulic head, etc. I look to see if the resulting plume makes sense in terms of the data set used to create the plume. I know I have not responded to your specific question, but I feel that the parameters you are looking at are secondary to reviewing the data to see if the results make sense.

Best Regards,

Jim

OliverLeimer1 · ‎03-14-2019

Hi Jim,
thanks for your advice. I've thought about this problem a long time and therefore included more datapoints outside of my actual area of interest. So the edges of my prediction-map will not be relevant for my model any more.
Considering, that only the center of my Kriging-prediction-map is of interest to me, doesn't that make the described errors (RMS, etc.) more important?

EricKrause · ‎03-13-2019

Hi Oliver. I know you're going to hate this answer, but the importance of different cross validation statistics depends a lot on your particular workflow and requirements. For example, if you only need predicted values and don't need standard errors (measures of uncertainty of the predicted values), then the Average Standard Error and RMS Standardized both become unimportant. The RMS (not the standardized RMS) directly measures how close the predicted values are to the measured values. The Mean and Mean Standardized both measure model bias; ie, whether there is a tendency to under- or over-predict the values (an unbiased model will have Mean and Mean Standardized values close to 0). Combining these two, you have information both about model accuracy and model bias, and it is up to you to decide which of these properties is most important.

That being said, here is the workflow that I usually follow. I consider the RMS Standardized to be a sanity check. If it is less that 0.8 or more than 1.2, I usually reject the model before even looking at other statistics. I'll then look at the Root Mean Square and decide if it is acceptable. Because it is in the units of the data, it gives an average margin of error for prediction (for example, if the RMS is 3, then on average each predicted value will be off by 3 from the true value at the location). Whether this margin of error is acceptable is going to depend on your workflow. If the margin of error is acceptable, I then move to the Mean and decide if the level of bias is acceptable. Again, the Mean is in data units, so it directly measures, on average, how much the values are under- or over-predicted. Is it acceptable if the model on average estimates values that are 0.5 higher than the measured values? Like before, it heavily depends on your workflow and the data.

Additionally, do not discount common sense and expert knowledge. If the cross validation results look good, but you know the surface is incorrect, you are completely justified in rejecting the model. All the software knows is the locations of the points and a number attached to them, and it is doing its best to detect patterns and correlations. But if you know that the patterns and correlations don't actually hold up out of sample, don't feel obligated to use them.

I also found this paper that discusses different approaches to groundwater interpolation. They ultimately recommend "non-colocated cokriging." They essentially used historical data sampled at separate locations as a cokriging variable. This ended up outperforming every univariate interpolation method.

OliverLeimer1 · ‎03-14-2019

Hi Eric,
thanks for your answer! I only need predicted values and focus on the center of my interpolated surface (so edge areas are less important).
I did a few kriging calculations (about 20 in total) with slightly different parameters, (all Ordinary Kriging, mostly with exponential or gaussian function) for the beginning and well, my errors do not differ a lot.

My RMS ranges between 0.19 and 0.20
RMSS lies between 0.999 and 1.14
ME ranges between 0.002 and 0.005

"Best" values occur when using exponential kernel function (although I've often read, gaussian is good for groundwater simulations).

Nevertheless, all results "look" fine, as they represent the flow-direction, that is commonly known for that area. Differences are small.
Considering that, I guess I should probably focus on the simulation with the lowest RMS then.
I hope I didn't forget another important point.

EricKrause · ‎03-14-2019

If the problem is picking between various models that all look good, that's a pretty good problem to have. As you said, probably just go with the one with the smallest RMS, but you should definitely note that different models with slightly varying parameters gave nearly identical results. That is a very good thing, as it means that your predicted values are robust.

There actually is a tool called Semivariogram Sensitivity that does a very similar workflow, it may save you some time. You give a kriging model as input, then give some tolerances to the semivariogram parameters, and it tries random combinations and produces predicted values and standard errors at a new set of locations. The idea is that if the predicted values and standard errors don't change much for different parameter combinations, this means that the somewhat arbitrary choice of parameters didn't have a huge impact.

[I've edited this post. It previously had some incorrect information.]

OliverLeimer1 · ‎03-15-2019

That gives me a very positive feeling.
Next step ist the interpolation of permeability-values. That probably won't go so well, but let's see

Thanks a lot for your advice!