Kriging input with uncertainties

SteveJones1 · ‎01-31-2014

I have a data set that I'd like to interpolate using Kriging. I know that ArcGIS can generate a variance raster alongside the interpolation result that shows the likely variance from the interpolation.

However, the z values in my input data have uncertainties of their own. Is there a way to have these uncertainties reflected in the output of the Kriging process? In other words, can the output variance raster express the variance that takes into account the uncertainty of the input data?

Cheers,
Steve.

EricKrause · ‎02-03-2014

Do you have access to Geostatistical Analyst? It can only be done in that extension.

CrystalMcClure · ‎02-28-2018

I have this exact same issue. How do you go about doing this in Geostatistical Analyst? (Yes I do have the extension)

EricKrause · ‎03-01-2018

Hi Crystal,

When there are uncertainties in the input data, we call that measurement error. Measurement error can be handled in several different ways, depending on your data.

Do each of your measured values have the same measurement error (ie, is the uncertainty the same for every value), or does the measurement error change from point to point?

CrystalMcClure · ‎03-01-2018

My measurement error changes from point to point.

Thanks for your help!

EricKrause · ‎03-01-2018

To incorporate heterogeneous measurement error (error that varies from point to point), you will need to use geostatistical simulations.

Here is the outline of the workflow:

Use the Geostatistical Wizard to interpolate the points using Simple Kriging (this will not work for other types of kriging). Do not worry about measurement error for now.
Use the Gaussian Geostatistical Simulations geoprocessing tool with the following parameters:
1. Input geostatistical layer - Provide the layer created in step 1. This will be the basis for the simulations.
2. Number of realizations - Enter a large number, maybe 1000. This is how many simulations will be performed.
3. Output workspace - Provide a directory or geodatabase to store the simulations.
4. Output simulation prefix - Enter a 3-character prefix to label the simulations.
5. Input conditioning features - Provide the feature class that you interpolated in step 1.
6. Conditioning field - Provide the field that you interpolated in step 1.
7. Conditioning measurement error field - This is where you provide the measurement error values. You will need to create a field on your conditioning features specifying the measurement error of each point. The value of the field must correspond to one standard deviation of the measurement error. If it isn't clear what I mean here, please ask.
8. Output cell size - Provide the cell size of the raster that will be simulated.
9. Raster statistics type - Check the boxes next to "Mean" and "Standard Deviation". The Mean raster will correspond to the kriging predictions, and the Standard Deviation raster will correspond to the standard errors of the predictions.

What you should expect to see after running the tool is that the kriging predictions (Mean raster) will be very close to to the kriging predictions without specifying measurement error. The standard errors (Standard Deviation raster) will be larger than they were without specifying measurement error. They will be larger because the uncertainty in the input is propagated correctly to the uncertainty in the output.

Let me know if you have any other questions or need any clarifications.

CrystalMcClure · ‎03-01-2018

I am a little unsure about #7 and the meaning of "one standard deviation of

measurement error". Is there a formula I could use to calculate this value

from mean, standard deviation, standard error, variance, etc.?

In #8, size seems to be predetermined when I started adding variables.

Should I just leave the size that it computes alone or is there some way I

should calculate that separately?

Thanks again for you help!

EricKrause · ‎03-01-2018

The cell size will default to 1/250 of the width or height of the raster. You can freely change this value or just use the default. This just changes the resolution of the raster, so if you want high resolution in the pixels, you can make the value smaller. It's really up to you and doesn't have any impact on the geostatistics.

About the measurement error, a little explanation is needed. The measurement error model that we use assumes that each location has some true, underlying value. However, when this true value is measured, there will be measurement error such that the measured value will not be identical to the true value.

The model assumes that the measured value at a location is equal to the true value at the location, plus some random noise. This noise is assumed to follow a normal (Gaussian) distribution, where the mean of the normal distribution is equal to the true, underlying value. What you need to provide to the tool is the standard deviation of this normal distribution. The larger the standard deviation, the more noisy the measurement.

Unfortunately, there is no way to really calculate this standard deviation for each location. You have to just know it somehow. This information sometimes can be found in the manufacturing documentation of whatever device you used to take the measurements, and sometimes it is known from past research or for physical reasons.

CrystalMcClure · ‎03-01-2018

This actually works really well for me. My data points are trend values

calculated from daily site data over 20 years. When I calculate the trend,

I also calculate standard error of the trend estimate, which is exactly

what you're asking for. (Let me know if I'm getting that wrong though). All

data at each site is taken by the same instruments and already corrected

for so I'm not worried about the instrument-to-instrument error, just the

error in my trend estimates.

One last thing, my the simulation (using 1000 for Number of Realizations)

has been running for about two hours now. Is that usual or should I lower

that number?

Thanks for all the detail you've provided. It's been invaluable.

EricKrause · ‎03-02-2018

Yes, that sounds like an ideal setup. In fact, that is exactly why measurement errors are passed as standard deviations. The most common source of measurement error is when the measured values aren't "measured" at all; instead, they are outputs or aggregations of some other model (like a long-term trend, in your case). These other models often calculate standard errors or standard deviations, and these can be propagated directly as measurement errors.

As for the time of calculation, I assume the process has either completed or you canceled it already. The biggest contributors to the computation time are the number of points, the output cell size, and the number of simulations. You can't really change the number of points, so you can either reduce the number of realizations or increase the cell size to speed up the process. Increasing the cell size will make the output more pixelated, and reducing the number of simulations will reduce precision in the calculated predictions and standard deviations. I can't really give more specific recommendations than that without looking at your data.