Co-kriging

4752
6
09-06-2012 01:36 PM
DeanMorgan
New Contributor
I've found from numerous cross-validations that when incorporating elevation data from DEM's into the co-kriging interpolation of rainfall data, that the resolution of the auxiliary data (i.e. DEM's) has some bearing on the resultant RMSE. In fact, some of the more coarser DEM's managed to produce fewer errors. I wondered if you had any suggestions for these observations? Perhaps the link between elevation and precipitation is only significant in these models at larger scales?

Many Thanks.
0 Kudos
6 Replies
SteveLynch
Esri Regular Contributor
Dean
Firstly, what type of precip data, hourly, daily, monthly, annual, mean annual, etc.?

Secondly, higher elevation does not always mean more rain, there are rain shadows, for example. So you may need additional info. In studies that I did many years ago I found that distance from the sea and surface roughness also played an important role. Also the rain-bearing direction.

An extensive literature search will help 🙂

Regards

Steve
0 Kudos
DeanMorgan
New Contributor
Hi Steve,

These are cumulative rainfall from tropical cyclone events, so it makes the task of linking elevation to precipitation a little more difficult, factoring in rain bands, eyewall etc. But for my study, I am concentrating more on the interpolation methods and just wondered why a coarser resolution DEM (10km over 1km) might yield a slightly lower RMSE. Any thoughts?

Thanks
0 Kudos
EricKrause
Esri Regular Contributor
I talked this over with a couple people, and we're not completely sure why this is happening.  It is probably a data-specific phenomenon where adding more points does not actually add more statistical information.  Because we work with searching neighborhoods, you may be filling up the neighborhood with duplicate information and losing relevant information beyond the search window.

In general, more data produces better predictions, but you seem to have hit a rare exception to that rule.
0 Kudos
YacoubRaheem
New Contributor
Several authors have found a similar phenomenon to what you found, namely that rainfall from rain gauges correlates more strongly with coarse resolution elevation than high resolution. I am looking at rainfall patterns in South Africa and Mozambique and found that 20km DEMs actually correlated more strongly with rainfall than 1km, 5km, 10km, or 15km to average annual rainfall.

I suggest a couple papers (you will probably need access to the journals in order for the weblinks to work):

Prudhomme, C., and D. W. Reed (1998). Relationships between extreme daily precipitation and topography in a mountainous region: A case study in Scotland. International Journal of Climatology, 18, 1439-1453. http://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-0088(19981115)18:13%3C1439::AID-JOC320%3E3.0.C...

Ekström, M, P. C. Kyriakidis, A. Chappell, P. D. Jones (2007). Spatiotemporal Stochastic Simulation of Monthly Rainfall Patterns in the United Kingdom (1980�??87). Journal of Climate, 20, 4194�??4210. http://journals.ametsoc.org/doi/pdf/10.1175/JCLI4233.1


I have a few questions of my own... and I apologize in advance for so many of them. They sort of build upon one another, so I thought it would be easier to post them all together:

I am looking at how ancillary terrain data (e.g., elevation, distance from large water body, slope, aspect) improves spatial interpolation of rainfall data from rainfall gauges. I have daily rainfall data but my dependent variable will be annual (or perhaps average annual or seasonal) rainfall.

1. Should I just be using ordinary cokriging (instead of simple or universal)?

2. There is a trend in the rainfall data. However, this trend is mainly due to elevation (and likely/hopefully some of these other ancillary terrain data). Thus, should I NOT detrend the data if I am going to cokrige with elevation and other data?

3. Following the geostatistical wizard, when performing semivariogram modeling, should I always click the Optimize Model button? When/why would I not want to optimize the model?

4. In general, optimizing produces improved results (in terms of prediction errors); however, there are few cases I have found when they don't. Do you know why that is?

5. Can I always use the prediction errors table as a guide for whether a model is improved or not? I realize the various prediction errors are providing different pieces of information (e.g. mean indicates level of bias; RMS indicates accuracy), however, if one piece of ancillary data improves the mean, for example, and cokriging with another piece of ancillary data improves the RMS, is there way to say that either piece of ancillary data is "better" than another?

6. Is there some pre-processing/analysis I should do before I get into cokriging with multiple pieces of data? I have developed cross-correlation tables with all my ancillary data, so in non-spatial terms I know which data are most highly correlated with rainfall. Are these definitely going to be the ones that also help to improve the model with cokriging or is there value in experimenting with ancillary data that aren't as highly correlated with rainfall?

7. Are there any in-built mechanisms for model selection criteria (e.g., AIC, BIC), in order to help determine which of my, say, 10 pieces of ancillary data are most appropriate for cokriging?

8. The Help Section on Cokriging says that "Theoretically, you can do no worse than kriging because if there is no cross-correlation, you can fall back on autocorrelation for Z1." However, there are times when some if not all the precision errors decrease when cokriging... and time when using two or three pieces of ancillary data for cokrining decreases the prediction errors more than just using one... Does this make sense? (Sometimes there is no change at all in the prediction errors when adding a second or third piece of ancillary data, and this makes sense, I suppose. I wonder if this could be determined apriori by just looking at the cross-correlation table.)

9. Is there a way to use the Geostatistical Wizard with model builder, or just some way to perform batch processing when I want to perform kriging with various combinations of ancillary data and compare prediction errors?

10. After I figure out which set of ancillary data best improves the model, I want to perform Gaussian Geostatistical Simulations in order to develop equally likely realizations of rainfall (for use in stochastic rainfall-runoff modeling)... I would like to use the ancillary data to help explain the trend as much as possible; next I want to perform simulations on the rainfall residuals; and finally I will add these residuals back to the trend. I haven't gotten to the simulations part yet, but I just noticed that it must be based only on a Simple Kriging model... does this mean I should be using simple kriging instead of ordinary kriging?... Or perhaps I'm comparing apples and oranges here... Perhaps my question should simple be: how do I combine cokriging with simulations? Perhaps the step I mentioned above about of first figuring out the trend then performing simulations on the residuals is not the way to think about it...

THANKS!!!... and sorry so many questions!... (I'll owe you a beer at the UC this year!)

Cheers,
Yacoub
0 Kudos
SteveLynch
Esri Regular Contributor
0 Kudos
JeffreyEvans
Occasional Contributor III
Yacoub,
I would highly recommend rethinking using Kriging for climate interpolation. Even a data set such as PRISM, that uses a hybrid Kriging approach, has to apply many post hoc corrections and still does not get the lapse rates correct. In my experience, spline models yield the best results. I commonly use AUNSPLINE, which consistently get the lapse rates. Here is a link to a paper where produced climate surfaces for the Western US using ANUSPLINE. http://www.treesearch.fs.fed.us/pubs/21485

I have been able to get similar results using a fit Thin Plate Spline in R.
0 Kudos