I am disappointed by the lack of documentation walking users step-by-step through Geostatistical Wizard. Is the correct sequence to follow?
1. Select model(s) for best fit of semivariogram of Var1 - Var1 and then optimize that model(s)
2. Select model(s) for best fit of covariance graph of Var1 - Var2 and then optimize this model, or does this somehow overwrite the model you selected and optimized for Var1 - Var1?
3. Select model(s) for best fit of semivariogram of Var2 - Var2 and then optimize that model(s), or does this somehow overwrite the models you selected and optimized for the other variable combinations above?
4. Proceed to searching neighborhood and cross validation steps
Can you reoptimize for a different model if the semivariogram looks horrible after your first optimization, or did the first optimization already distort the data such that you should close out of Geostatistical Wizard and start again from scratch?
How can you fit a model to a negative correlation between the two variables? All the model options create curves of positive correlations only, unless there is a setting for this that I am missing?
Eric Krause or anyone else knowledgeable about Geostatistical Wizard, could you help me? Thanks in advance for any help that you can provide.
Solved! Go to Solution.
The optimize button does not do a complete optimization. For example, the choice of semivariogram model is not optimized. The default model is "Stable," and the optimize model button will find the range, nugget, and sill that minimize the RMS for the Stable model. If you change, for example, to K-Bessel, pressing the optimize model button will find different optimal parameters for the K-Bessel model, which may or may not be better than the optimal Stable parameters. But again, the model with the lowest RMS is not necessarily the best model; there are lots of other diagnostics that you should pay attention to. If those other diagnostics do not look good, the model may need some manual changes.
If you want to manually specify the parameters for the three models, you change them directly with the parameters on the right side of the wizard. Changing Var1 - Var1 to Var2 - Var2, for example, just changes which graph you are looking at, but you can control the parameters of all three models no matter which one you are currently looking at.
You don't need to restart the Wizard after each change. Hitting the optimize model button will set all parameters to their optimal values, no matter what their current values happen to be. If you want to get back to the original default values (before you pressed optimize), you either need to restart or click Back a couple times, then Next a couple times, and they will revert to their non-optimal defaults.
With regards to Figure 11 in your link, you're looking at a covariance view of the semivariogram. You can get your graph to look like this by changing the "Variable" setting on the top-right of the wizard to "Covariance". This just just a different view of the same thing. Instead of making a graph of squared differences (the semivariogram), it makes a graph of covariances, and the idea is that points that are close together are more correlated (ie, have a larger covariance) than points that are further apart. After a particular distance (the range), the covariance becomes 0, which means that points that are further apart than the range are considered spatially independent. You'll also notice that the blue covariance curve never goes below zero, which is what a true "negative covariance model" would look like.
The general idea is that the semivariogram plots how different points are, and the covariance view plots how similar they are. So, points that are farther away will have larger semivariances, but they will have lower covariances.
In Cokriging, if you press the optimize model button, the software will simultaneously optimize the semivariogram for the primary dataset (Var1 - Var1), the semivariogram for the cokriging dataset (Var2-Var2), and the crosscovariance between them (Var1 - Var2). If you do not want to rely on the optimize model button, you can manually specify the parameters for all three covariance models. What the optimize model button is actually doing is finding the set of parameters (for all three covariance models) that minimize the root-mean-square (RMS) crossvalidation error. However, the RMS is just one of many diagnostics, and it isn't uncommon for the model with the lowest RMS to fail at other diagnostics.
Negative cross-covariances cannot be modeled in ArcGIS. All covariance models supported in ArcGIS assume positive spatial correlation that diminishes over distance.
Thanks for explaining, Eric, but I'm afraid that I am still confused.
Are your results from hitting the Optimize button independent of the model selected such that it's not necessary for the user to determine the best model? That's the only way I can think of that it could optimize three different relationships (Var 1 - Var1, Var1 - Var2, Var2 - Var2) that likely have three different best models.
If optimize is dependent on the model selected, after I hit it and then want to try an alternative model, do I need to close the wizard to avoid multiple optimizations compounding changes from the true data?
You say that if you don't want to rely on the optimize model button, you can manually specify the paramenters for all three covariance models, so that means I can select different models for each relationship by displaying them one at a time and selecting a model for each one as long that will be applied only to that relationship, or does the wizard only accommodate a single suite of Model #1 + Model #2, etc. for all relationships?
Figure 11 of the paper
Multivariate oil and gas data interpolation: data exploration and modeling
shows a negative covariance model in Geostatistical Wizard, so I know it is possible somehow.
Thanks for helping me figure this out!
Have a look at Semivariogram and covariance functions
Thanks for the link on general concepts of semivariogram and covariance functions, Steve. My questions above are focused on the specific mechanics of how Geostatistical Wizard implements these concepts to know step-by-step how to optimize all three relationships (Var1 - Var1, Var1 - Var2, Var2 - Var2) correctly.
The optimize button does not do a complete optimization. For example, the choice of semivariogram model is not optimized. The default model is "Stable," and the optimize model button will find the range, nugget, and sill that minimize the RMS for the Stable model. If you change, for example, to K-Bessel, pressing the optimize model button will find different optimal parameters for the K-Bessel model, which may or may not be better than the optimal Stable parameters. But again, the model with the lowest RMS is not necessarily the best model; there are lots of other diagnostics that you should pay attention to. If those other diagnostics do not look good, the model may need some manual changes.
If you want to manually specify the parameters for the three models, you change them directly with the parameters on the right side of the wizard. Changing Var1 - Var1 to Var2 - Var2, for example, just changes which graph you are looking at, but you can control the parameters of all three models no matter which one you are currently looking at.
You don't need to restart the Wizard after each change. Hitting the optimize model button will set all parameters to their optimal values, no matter what their current values happen to be. If you want to get back to the original default values (before you pressed optimize), you either need to restart or click Back a couple times, then Next a couple times, and they will revert to their non-optimal defaults.
With regards to Figure 11 in your link, you're looking at a covariance view of the semivariogram. You can get your graph to look like this by changing the "Variable" setting on the top-right of the wizard to "Covariance". This just just a different view of the same thing. Instead of making a graph of squared differences (the semivariogram), it makes a graph of covariances, and the idea is that points that are close together are more correlated (ie, have a larger covariance) than points that are further apart. After a particular distance (the range), the covariance becomes 0, which means that points that are further apart than the range are considered spatially independent. You'll also notice that the blue covariance curve never goes below zero, which is what a true "negative covariance model" would look like.
The general idea is that the semivariogram plots how different points are, and the covariance view plots how similar they are. So, points that are farther away will have larger semivariances, but they will have lower covariances.
Thanks so much Eric! You helped clear up a lot of my confusion. It sounds like it is not possible to select different models for different relationships, that there is a single suite of one or more models that is supposed to apply to all three relationships, both with or without the optimize button?
Which diagnostics would you recommend beyond RMS for selecting the best model?
Good point about the example I pointed to being above zero. In my own data unfortunately I have a truly negative correlation--see attached. What would you recommend I do for this situation if Geostatistical Analyst only accommodates positive relationships?
Correction to something I said earlier. We apparently do support negative cross-covariances (I actually did not know this). The covariances of Var1 - Var1 and Var2 - Var2 do have to be positive, but the cross-covariance between them (Var1 - Var2) can indeed be negative. The idea is that each dataset has to be positively spatially correlated with itself, but the cross-correlation between them can be negative. This will be useful in cases where both the primary and cokriging dataset are spatially correlated (again, to themselves), but the two variables have an inverse relationship, ie, when one is large, the other tends to be small.
As for other diagnostics, the RMS is just one crossvalidation statistic. The others are described in this help document:
Cross Validation—Help | ArcGIS for Desktop
You should also look at the crossvalidation summary statistics, the locations of largest/smallest errors, and the graphs on the final page of the Wizard.
Wonderful, thanks Eric!
I am trying to find the model that's the least bad for the other two relationships when selected for a given relationship. It seems especially hard to find any model to fit the covariance graph, so I am prioritizing fitting Var1 - Var2, then Var2 - Var2, then Var1 - 2.
The primary dataset almost always has much more impact than the cokriging dataset, so if you're going to prioritize any of them, Var1 - Var1 should be the priority.