Universal Kriging with External Drift

Tae-Jung_JonathanKwon · ‎12-23-2012

Hi

I am looking for a way to use UK with external drift and the external drift. Or in others words, is there any way to perform a regression kriging on ArcGIS? (i.e., kriging with underlying trend..).
Thanks for your input in advance.

EricKrause · ‎12-26-2012

Universal kriging is available in the Geostatistical Wizard. It's one of the six kriging types. The trend is calculated from polynomials of the (x,y) coordinates; it does not support covariates other than the spatial location. However, covariates can be used as cokriging variables.

Tae-Jung_JonathanKwon · ‎12-26-2012

Thank you very much for your answer.
Could you please be little more specific as to what I should do?
You are right on that I do not need to use the trend derived from polynomials of the x and y coordinates.
This is being taken into account when I run the multiple linear regression where I consider 1. lat, 2 Long. 3. Dist to water. 4 Altitude, and 5. relative topography (differences in elevations between the target cell and nearby cells (within 1km). Here the dependent variable is mean air temperature. So I calibrated the model and and by using map algebra, I used the coefficients of the significant parameters (in my case, they were lat, alt, distance to water) to calculate the result of my regression for the whole area (thus producing a continuous surface).

Referring back to what you said, how could I use the continuous map generated from the regression as cokringing variables? It will be greatly appreciated if you could further explain or elaborate..

One more thing, is there any tool that allows me to calculate the correlation coefficient between the target cell and the nearby cells? For example, from MLR or Kriging, I can generate the interpolated map in which each cell contains a unique value (in my case it would be mean air temperature). I want to know if there is any tool available to see how much the value is each cell is related with averaged values in other neighbouring cells.

Thank you very much for your answers in advance and happy new year!

Universal kriging is available in the Geostatistical Wizard. It's one of the six kriging types. The trend is calculated from polynomials of the (x,y) coordinates; it does not support covariates other than the spatial location. However, covariates can be used as cokriging variables.

EricKrause · ‎12-26-2012

If you have a good MLR model already, I wouldn't try to use the covariates as cokriging variables. If you want to try it anyway, in the Geostatistical Wizard, when you choose kriging on the first page, you can enter up to four datasets. The first one you enter is the variable you will interpolate, and the three additional datasets will be used as cokriging variables.

"Kriging" is often called "residual kriging," and there's a reason for this: you always perform kriging on the residuals of some model. This model can be almost anything, but "regression kriging," "kriging with external drift," "universal kriging," and "linear mixed model kriging" all generally refer to the simultaneous estimation of the covariate coefficients and the kriging parameters. However, you may find success with doing a sequential estimation: first calculate the coefficients using your MLR model. Then calculate the residuals, and perform Simple kriging on these residuals (you should use Simple kriging instead of Ordinary kriging because you know that the mean of the residuals is 0). Then add these interpolated residuals back into the MLR predictions. You'll lose some power because you are sequentially calculating parameters (rather than simultaneously estimating them), but you should still get defensible results.

As for comparing the values in one point to the average of the neighboring points, the Semivariogram/Covariance cloud is probably the best way to visualize this, but the result is a graph rather than a single correlation coefficient. If you really need to calculate the correlation coefficient (and you're ok with ignoring spatial correlation in the analysis), we have a tool called Neighborhood Selection that selects the neighbors of an input (x,y) location (use the same neighborhood parameters that you used in kriging). It will probably take a lot of work, but I'm sure you can write a Python script that will do what you're trying to do. I've never personally done this, so I don't want to try to outline an algorithm, but I'm sure all the tools are there to accomplish this task.

EricKrause · ‎12-26-2012

Also, if you're going to do kriging on the residuals of your MLR model, recalculate the model without using Lat as a covariate. Otherwise you'll be "double-counting" (for lack of a better phrase) the spatial location.

EricKrause · ‎12-26-2012

Also, we heavily suggest using a projected coordinate system rather than a Lat-Long GCS. Distance calculations get badly distorted when using Lat-Long, and this distortion will get propagated through all the kriging calculations.

Tae-Jung_JonathanKwon · ‎12-28-2012

[ATTACH=CONFIG]20260[/ATTACH]
Dear Eric6346,

Thanks for your kind and detailed explanations:D

I would just like to confirm with you the method that you suggested; a sequential estimation by performing simple-kriging residuals
and combining this with MLR predictions. Here I have listed a few technical issues that I am facing;
1. I have created a grid surface using fishnet with 1km by1km covering my entire study area (500 cells by 500 cells)
2. I have 100 weather stations from which I calculated the mean air temperature (using 5 years of temperature data) for each station
3. I then performed MLR in SPSS; dependent variable being mean air temperature and independent variables being lat, altitude (using DEM layer), distance to water (using "near" function; each station to the nearest large water body).
4. From MLR, I now have an equation and with this, I calculate the residuals. The residuals were calculated by taking the difference between the observed and estimated (using the MLR equation). So I have 100 residuals.
5. With the MLR equation, I also generate an interpolated gridded surface; For each and all cells (1km^2) created in step1, and I extracted lat, alt, and distance to water thereby producing mean surface temperature estimated map on entire 500 cells by 500 cells. (please see the attached sample showing in a bigger grid size)
6. Now with the 100 residuals, I perform Simple Kriging thereby generating the continuous residual estimated map.
7. Then I assign the continuous surface found in step 6 onto each and every cell (i.e., 500 cells by 500 cells). This can be done by taking the average of the estimated residuals that fall into each cell.
8. Now simply add the products calculated in Steps 5 and 7.

Do these steps seem make sense to you?

Later on, I intend to perform a sensitivity analysis by using different grid size (5km by 5km, 10km by 10km, and 20km by 20km).
For this reason, it would be better if I could generate a MLR continuous surface (same resolution as continuous surface generated by Kriging) so that I can simply add both maps and assign them to whichever gird size that I want to test on. But I do not know the way to calculate the continuous MLR map (probably it is because I performed MLR outside of ArcGIS but in other software like SPSS).

As for the second question I had about calculating the correlation coefficient. Let me explain what I intend to do with this.
With the map generated in above steps (i.e., estimated mean air temperature map), I intend to analyse the current weather station setting and to recommend other potential sites. Let's assume that I want to install a weather station in "cold spots" meaning that a cell with the coldest estimated air temperature gets priority for weather station installation (this is why I intend to generate the interpolated air temperature map in the first place). In other words, I can have a rank for all cells by assigning the coldest cell with 1 and the warmest cell with say 100 (thus the rank ranges 1 to 100). But from the interpolated map, it is possible that a group of highly ranked cells are nearby each other. In this case, I then have to make a decision as to which cell gets more weight/priority; a cell with a high correlation coefficient gets priority as it would provide more "benefit" (since it is more representative) than a cell with a low correlation coefficient.
I will go ahead and explore the feasibility of what you suggested. But please suggest if there is any other way in ArcGIS to accomplish this task..

I am sorry for this long question, but your assistance would be greatly appreciated. Thanks again.

If you have a good MLR model already, I wouldn't try to use the covariates as cokriging variables. If you want to try it anyway, in the Geostatistical Wizard, when you choose kriging on the first page, you can enter up to four datasets. The first one you enter is the variable you will interpolate, and the three additional datasets will be used as cokriging variables.

"Kriging" is often called "residual kriging," and there's a reason for this: you always perform kriging on the residuals of some model. This model can be almost anything, but "regression kriging," "kriging with external drift," "universal kriging," and "linear mixed model kriging" all generally refer to the simultaneous estimation of the covariate coefficients and the kriging parameters. However, you may find success with doing a sequential estimation: first calculate the coefficients using your MLR model. Then calculate the residuals, and perform Simple kriging on these residuals (you should use Simple kriging instead of Ordinary kriging because you know that the mean of the residuals is 0). Then add these interpolated residuals back into the MLR predictions. You'll lose some power because you are sequentially calculating parameters (rather than simultaneously estimating them), but you should still get defensible results.

As for comparing the values in one point to the average of the neighboring points, the Semivariogram/Covariance cloud is probably the best way to visualize this, but the result is a graph rather than a single correlation coefficient. If you really need to calculate the correlation coefficient (and you're ok with ignoring spatial correlation in the analysis), we have a tool called Neighborhood Selection that selects the neighbors of an input (x,y) location (use the same neighborhood parameters that you used in kriging). It will probably take a lot of work, but I'm sure you can write a Python script that will do what you're trying to do. I've never personally done this, so I don't want to try to outline an algorithm, but I'm sure all the tools are there to accomplish this task.

EricKrause · ‎12-28-2012

The steps you've outlined are correct.

Unfortunately, there's no way to make a continuous OLS surface. You'll need to make separate rasters for each resolution you want to test, but you only need to krig on the residuals once. You can export the interpolated residual surface to any cell size and extent that you want.

I now understand what you're trying to do with the correlation coefficient. However, I don't think a correlation coefficient will work here because if you want to correlate a single point to the mean of its neighbors, you'll only be able to calculate a single coefficient for the entire surface (since you need repeated samples), so it won't help you in deciding which particular locations should be given preference.

The first thing that comes to mind is the Voronoi Map tool. It's an interactive graphical tool, and if you use Standard Deviation, Entropy, or Interquartile Range, you'll get an estimate of the local variability. A small local variability indicates that the predictions are more constant in that area, so they might be good candidates for new sites because the area can be better represented by a single value. Note that you'll need to convert your rasters to points to run the tool.

Tae-Jung_JonathanKwon · ‎12-29-2012

Dear Eric6346,

Thanks for all your explanations.

I at first intended to calculate correlation coefficient by including its direct neighbours (just like how Entropy is calculated) for each cell and for entire map.
Yes, your suggestion of using the Voronoi Map does really do what I wanted to do! 😄

So I generated the Voronoi Map which also contained Standard Deviation, Entropy, and etc. I looked up the descriptions for each stat generated from the Voronoi Map, and like you well mentioned, standard deviation and/or entropy would best describe the local variability. Since both measures explain the variability of the target cell and its neighbours, I would expect both measures be strongly correlated. However their degree of correlation wasn't so strong. (please see the attached diagram)[ATTACH=CONFIG]20319[/ATTACH]. I do understand that they are calculated in a different manner but shouldn't both measures have a strong positive correlation?

This leads me to wonder how both measures are calculated and which measure I should use for deciding the next potential weather station site in my study. I attached a sample Excel sheet in which I listed values for the target cell (i.e., center) and its neighbours. I also included the calculated values for Standard deviation and Entropy for the target cell. Would you be able to explain what was done in ArcGIS to calculate these two values? I looked up the description for Entropy, but it seemed like ArcGIS uses "Smart Quantiles" to categorize the values into five different classes. Unless I know the exact boundary of these five classes, it is difficult to apply the equation listed in this page..

Again, thank you very much for your continued support and helpful comments!

Jonathan

The steps you've outlined are correct.

Unfortunately, there's no way to make a continuous OLS surface. You'll need to make separate rasters for each resolution you want to test, but you only need to krig on the residuals once. You can export the interpolated residual surface to any cell size and extent that you want.

I now understand what you're trying to do with the correlation coefficient. However, I don't think a correlation coefficient will work here because if you want to correlate a single point to the mean of its neighbors, you'll only be able to calculate a single coefficient for the entire surface (since you need repeated samples), so it won't help you in deciding which particular locations should be given preference.

The first thing that comes to mind is the Voronoi Map tool. It's an interactive graphical tool, and if you use Standard Deviation, Entropy, or Interquartile Range, you'll get an estimate of the local variability. A small local variability indicates that the predictions are more constant in that area, so they might be good candidates for new sites because the area can be better represented by a single value. Note that you'll need to convert your rasters to points to run the tool.

Tae-Jung_JonathanKwon · ‎12-30-2012

I figured out how the software would calculate the statistics (at least in my case).. Taking an example of numbers in the attached Excel file (or below);

1       2        3
4       0        5
6       7        8

And the values assigned to each number are
0:      1.610996 (target cell)
1: 2.082779
2: 1.677525
3: 1.553863
4: 1.876214
5: 1.437138
6: 1.81845
7: 1.548435
8: 1.425129

If I were to calculate mean, it should be sum(0:8)/9= 1.670059. But the tool (Voronoi) ALWAYS ignores the values in 3 and 6 (top right corner and bottom left corner) when calculating mean thereby giving 1.665459 instead of 1.670059... This affects calculations for all other stats as well (e.g., stdev, entropy, and etc).. Any idea as to why this is happening and/or suggestion to fix this problem?
Many thanks!

Jonathan