Hi

I am looking for a way to use UK with external drift and the external drift. Or in others words, is there any way to perform a regression kriging on ArcGIS? (i.e., kriging with underlying trend..).

Thanks for your input in advance.

I am looking for a way to use UK with external drift and the external drift. Or in others words, is there any way to perform a regression kriging on ArcGIS? (i.e., kriging with underlying trend..).

Thanks for your input in advance.

Could you please be little more specific as to what I should do?

You are right on that I do not need to use the trend derived from polynomials of the x and y coordinates.

This is being taken into account when I run the multiple linear regression where I consider 1. lat, 2 Long. 3. Dist to water. 4 Altitude, and 5. relative topography (differences in elevations between the target cell and nearby cells (within 1km). Here the dependent variable is mean air temperature. So I calibrated the model and and by using map algebra, I used the coefficients of the significant parameters (in my case, they were lat, alt, distance to water) to calculate the result of my regression for the whole area (thus producing a continuous surface).

Referring back to what you said, how could I use the continuous map generated from the regression as cokringing variables? It will be greatly appreciated if you could further explain or elaborate..

One more thing, is there any tool that allows me to calculate the correlation coefficient between the target cell and the nearby cells? For example, from MLR or Kriging, I can generate the interpolated map in which each cell contains a unique value (in my case it would be mean air temperature). I want to know if there is any tool available to see how much the value is each cell is related with averaged values in other neighbouring cells.

Thank you very much for your answers in advance and happy new year!

"Kriging" is often called "residual kriging," and there's a reason for this: you always perform kriging on the residuals of some model. This model can be almost anything, but "regression kriging," "kriging with external drift," "universal kriging," and "linear mixed model kriging" all generally refer to the simultaneous estimation of the covariate coefficients and the kriging parameters. However, you may find success with doing a sequential estimation: first calculate the coefficients using your MLR model. Then calculate the residuals, and perform Simple kriging on these residuals (you should use Simple kriging instead of Ordinary kriging because you know that the mean of the residuals is 0). Then add these interpolated residuals back into the MLR predictions. You'll lose some power because you are sequentially calculating parameters (rather than simultaneously estimating them), but you should still get defensible results.

As for comparing the values in one point to the average of the neighboring points, the Semivariogram/Covariance cloud is probably the best way to visualize this, but the result is a graph rather than a single correlation coefficient. If you really need to calculate the correlation coefficient (and you're ok with ignoring spatial correlation in the analysis), we have a tool called Neighborhood Selection that selects the neighbors of an input (x,y) location (use the same neighborhood parameters that you used in kriging). It will probably take a lot of work, but I'm sure you can write a Python script that will do what you're trying to do. I've never personally done this, so I don't want to try to outline an algorithm, but I'm sure all the tools are there to accomplish this task.

Dear Eric6346,

Thanks for your kind and detailed explanations:D

I would just like to confirm with you the method that you suggested; a sequential estimation by performing simple-kriging residuals

and combining this with MLR predictions. Here I have listed a few technical issues that I am facing;

1. I have created a grid surface using fishnet with 1km by1km covering my entire study area (500 cells by 500 cells)

2. I have 100 weather stations from which I calculated the mean air temperature (using 5 years of temperature data) for each station

3. I then performed MLR in SPSS; dependent variable being mean air temperature and independent variables being lat, altitude (using DEM layer), distance to water (using "near" function; each station to the nearest large water body).

4. From MLR, I now have an equation and with this, I calculate the residuals. The residuals were calculated by taking the difference between the observed and estimated (using the MLR equation). So I have 100 residuals.

5. With the MLR equation, I also generate an interpolated gridded surface; For each and all cells (1km^2) created in step1, and I extracted lat, alt, and distance to water thereby producing mean surface temperature estimated map on entire 500 cells by 500 cells. (please see the attached sample showing in a bigger grid size)

6. Now with the 100 residuals, I perform Simple Kriging thereby generating the continuous residual estimated map.

7. Then I assign the continuous surface found in step 6 onto each and every cell (i.e., 500 cells by 500 cells). This can be done by taking the average of the estimated residuals that fall into each cell.

8. Now simply add the products calculated in Steps 5 and 7.

Do these steps seem make sense to you?

Later on, I intend to perform a sensitivity analysis by using different grid size (5km by 5km, 10km by 10km, and 20km by 20km).

For this reason, it would be better if I could generate a MLR continuous surface (same resolution as continuous surface generated by Kriging) so that I can simply add both maps and assign them to whichever gird size that I want to test on. But I do not know the way to calculate the continuous MLR map (probably it is because I performed MLR outside of ArcGIS but in other software like SPSS).

As for the second question I had about calculating the correlation coefficient. Let me explain what I intend to do with this.

With the map generated in above steps (i.e., estimated mean air temperature map), I intend to analyse the current weather station setting and to recommend other potential sites. Let's assume that I want to install a weather station in "cold spots" meaning that a cell with the coldest estimated air temperature gets priority for weather station installation (this is why I intend to generate the interpolated air temperature map in the first place). In other words, I can have a rank for all cells by assigning the coldest cell with 1 and the warmest cell with say 100 (thus the rank ranges 1 to 100). But from the interpolated map, it is possible that a group of highly ranked cells are nearby each other. In this case, I then have to make a decision as to which cell gets more weight/priority; a cell with a high correlation coefficient gets priority as it would provide more "benefit" (since it is more representative) than a cell with a low correlation coefficient.

I will go ahead and explore the feasibility of what you suggested. But please suggest if there is any other way in ArcGIS to accomplish this task..

I am sorry for this long question, but your assistance would be greatly appreciated. Thanks again.

## Attachments

Unfortunately, there's no way to make a continuous OLS surface. You'll need to make separate rasters for each resolution you want to test, but you only need to krig on the residuals once. You can export the interpolated residual surface to any cell size and extent that you want.

I now understand what you're trying to do with the correlation coefficient. However, I don't think a correlation coefficient will work here because if you want to correlate a single point to the mean of its neighbors, you'll only be able to calculate a single coefficient for the entire surface (since you need repeated samples), so it won't help you in deciding which particular locations should be given preference.

The first thing that comes to mind is the Voronoi Map tool. It's an interactive graphical tool, and if you use Standard Deviation, Entropy, or Interquartile Range, you'll get an estimate of the local variability. A small local variability indicates that the predictions are more constant in that area, so they might be good candidates for new sites because the area can be better represented by a single value. Note that you'll need to convert your rasters to points to run the tool.

Thanks for all your explanations.

I at first intended to calculate correlation coefficient by including its direct neighbours (just like how Entropy is calculated) for each cell and for entire map.

Yes, your suggestion of using the Voronoi Map does really do what I wanted to do! :D

So I generated the Voronoi Map which also contained Standard Deviation, Entropy, and etc. I looked up the descriptions for each stat generated from the Voronoi Map, and like you well mentioned, standard deviation and/or entropy would best describe the local variability. Since both measures explain the variability of the target cell and its neighbours, I would expect both measures be strongly correlated. However their degree of correlation wasn't so strong. (please see the attached diagram)[ATTACH=CONFIG]20319[/ATTACH]. I do understand that they are calculated in a different manner but shouldn't both measures have a strong positive correlation?

This leads me to wonder how both measures are calculated and which measure I should use for deciding the next potential weather station site in my study. I attached a sample Excel sheet in which I listed values for the target cell (i.e., center) and its neighbours. I also included the calculated values for Standard deviation and Entropy for the target cell. Would you be able to explain what was done in ArcGIS to calculate these two values? I looked up the description for Entropy, but it seemed like ArcGIS uses "Smart Quantiles" to categorize the values into five different classes. Unless I know the exact boundary of these five classes, it is difficult to apply the equation listed in this page..

Again, thank you very much for your continued support and helpful comments!

Jonathan

## Attachments

1 2 3

4 0 5

6 7 8

And the values assigned to each number are

0: 1.610996 (target cell)

1: 2.082779

2: 1.677525

3: 1.553863

4: 1.876214

5: 1.437138

6: 1.81845

7: 1.548435

8: 1.425129

If I were to calculate mean, it should be sum(0:8)/9= 1.670059. But the tool (Voronoi) ALWAYS ignores the values in 3 and 6 (top right corner and bottom left corner) when calculating mean thereby giving 1.665459 instead of 1.670059... This affects calculations for all other stats as well (e.g., stdev, entropy, and etc).. Any idea as to why this is happening and/or suggestion to fix this problem?

Many thanks!

Jonathan

And I'm not surprised at the fairly weak R^2 between standard deviation and entropy. Because entropy works with classified values (rather than raw values), the entropy map tends to be smoother. Entropy also has a maximum, but standard deviation has no maximum. So, two cells can have the same entropy value but still have very different standard deviations. If you look at your scatterplot, you can even see this; there are clear vertical columns that all have the same entropy, but the standard deviations vary a lot. This variance is what is pulling down the R^2.

Yes use of Focal Statistics would do the job since it gives standard deviation (which really indicates the local variability with its neighbours. I will go ahead and try it out!

Again, thanks a lot for all your support! Your support truly helped very much!!!

I will come back with more question if I have any :D

Have a good weekend.

Jonathan

Hi Eric,

One quick question; following the steps mentioned above, I have krigged the residuals using 40 points (appeared as black pin). But as you can see in the attached image, the krigged region does not cover the entire study area. I know that it only allows to krig to the extent where the known points are.. Is there any way to extend this and to cover the entire study area? So far I have not found any option to do so.. Thanks!

Jonathan

## Attachments

Thank you!

You could, however, use the layer that you created in the GALayerToGrid tool and specify an output extent in the environment and your output raster will then have this new extent.

Steve

Thanks for providing additional info. Yes it is definitely worthwhile using it since I have equal-sized grids (MLR surface), and I will assign a single value to each and every grid by taking the average of the converted raster data to get the final output of "regression-Kriging"...

Have a good weekend.

Jonathan

I would like to know how exactly we can do the external drift kriging using ArcGIS. I couldnot figure it out(got confused) from the above discussion.

Also can anyone tell whether it is possible to do it for a time series of data. I have the hourly precipitation data for some staions. I would like to interpolate it for the entire area.

Thanks in advance! :)

So I carried out the analysis using a variable called mean surface temperature from 40 stations.

Following the steps that I outlined, I linearly regressed using some auxiliary information (e.g., easting, northing, altitude, relative topography, and distance to water). And I obtained the good results (R^2 = 89%).

I calculated the residuals and tried to krig using simple Kriging available in ArcGIS 10.1 (since I know that sum of all residuals is equal to zero)... but the result of krigging was poor.. (please see the attached..)

How would you interpret this phenomenon? You thoughts on this would be appreciated.

Thanks,

[ATTACH=CONFIG]21416[/ATTACH][ATTACH=CONFIG]21417[/ATTACH]

## Attachments

Second, the residuals of your OLS model don't appear to display spatial autocorrelation. Looking at the crossvalidation statistics and the empirical semivariances (the blue crosses in the semivariogram screen), your residuals seem to be independent and normally distributed. Sampling more data may reveal more spatial patterns that can be used in kriging, but I don't see anything in those diagnostics that indicates the residuals of your OLS model need to be kriged at all.

Okay. I understand. The example that I showed you does not really require any further analysis or need to be krigged at all since the residuals are independent and normally distributed (or since the regression result is very good).

You mentioned about the spatial autocorrelation. I have attached another example. For this, I took the standard deviation of surface temperature as a means to describe the variability of surface temperature (VST). And the variable that I regressed earlier was mean surface temperature (MST). The regression model using the same auxiliary variable revealed that the VST were highly correlated with northing, altitude, and distance to water (which makes intuitive sense..)

Then I ran the simple krigging on the VST residuals (see the attached files), and the result seemed to be okay. There was a spatial dependence between these 40 locations to a certain extent and the residuals were not normally distributed.. I also expected this to be "not good" since the location information (i.e., northing) had already been taken into account when calibrating the model using linear regression.. Since the models for both VST and MST were calibrated using the similar locational attributes/auxiliary variables, I am having difficulties understanding as to why simple krigging would give such different results on the residuals for VST and MST.. Any thoughts or suggestions on this? My speculation would be the difference in the overall quality of the regression models: MST (R^2=89%) and VST (R^2=63%).

Thanks!

## Attachments

I have a question regarding the difference of the underlying theories between simple kriging (SK) and ordinary kriging (OK). Literature well describes the nature of their difference; SK assumes that the mean is known and constant over the entire domain whereas OK assumes that the mean is unknown.

We also know that SK requires knowledge of the mean to solve the problem of finding weights that minimize the variance of the estimation error, but OK elegantly discards this requirement by filtering out the mean by performing a constrained optimization such that sum of OK weights equals 1.

My confusion comes from one literature stating that SK not only assumes the expectation of the random field to be known (i.e., known mean), but also relies on a covariance function to be known beforehand. [ATTACH=CONFIG]29456[/ATTACH]

My understanding is that regardless of which method to use (either SK or OK), covariance must be determined from semivariogram analysis when interpolating the values (kriging estimate and error variance). However this contradicts with the statement provided above (i.e., covariance function needs to be known beforehand).

Could you explain if there is any part that I misunderstood or if the statement is not true? Your answer will be greatly appreciated.

Thank you.

## Attachments

This is one of several problems that we addressed with Empirical Bayesian Kriging in ArcGIS 10.1. Instead of estimating a single semivariogram and assuming it is correct, EBK simulates many semivariograms, so you end up with an entire spectrum that we weight by likelihood. By accounting for some uncertainty in the estimation of the semivariogram, this weighted spectrum does a much better job of estimating the covariance structure than relying on a single semivariogram. If you work with some data using ordinary, simple, and empirical Bayesian kriging, you'll notice that EBK usually gives larger standard errors than the others. This might seem like a disadvantage, but the larger standard errors will usually be more accurate. For example, if you use simple or ordinary kriging and make 90% confidence intervals, it will probably only capture 75% of the data, whereas 90% confidence intervals from EBK should capture closer to 90% of the data.

I am now trying to use semivariogram to analyse/characterize the spatial patterns of friction measurements collected at an equal interval (every minute) by a mobile unit. The measurements are GPS tagged so they can be visualized on the map as attached below. (the lines represent a road network)

[ATTACH=CONFIG]32629[/ATTACH]

I am trying to build a semivarogram model for each run (collected at different days) and compare their similarities/differences by examining the model parameters (i.e., sill, nugget, range..)

Here I have attached the diagram of datasets collected at two different days (Excel file is also attached)

[ATTACH=CONFIG]32630[/ATTACH]

As you can see, the friction measurements collected on these two days are similar to each other and thus I would expect that their spatial patterns are also similar... But their semivariograms show that they are quite different to each other..

For Day1

[ATTACH=CONFIG]32631[/ATTACH]

For Day2

[ATTACH=CONFIG]32632[/ATTACH]

Would you be able to reason why this is the case? I would expect that, for instance, their ranges should be similar but they aren't. I understand that the range (and other parameters) could vary subject to how you fit the empirical semivarainces by using different models (gaussian, exponential...etc), but there still exists a large difference in their spatial characteristics.

Any comments/suggestions would be appreciated...

## Attachments