|
POST
|
It may not work with your data, but the idea I had was to Empirical Bayesian Kriging with a Probability map output. If you had a continuous variable with a threshold indicating the cutoff between sand and clay, you could just interpolate the continuous variable with EBK and calculate the probability that each new location exceeds that threshold using the predicted value and the standard error. The Probability output type does this calculation automatically. The biggest problem with Indicator Kriging is that when you reclassify from a continuous variable to a binary variable, you lose lots of information. This is why Indicator Kriging usually doesn't perform well. But if you only have indicator variables to start out and didn't create them with a reclassification, indicator kriging is about the best you can do. And just to add to the complexity, there's another type of kriging called Probability Kriging. It takes a continuous variable and a threshold. It then reclassifies the values to 0 and 1 and performs indicator kriging. However, it also uses the raw values of the continuous variable as a cokriging variable. This essentially tries to bring back in some of the information that was lost during the reclassification. In my experience, just using EBK with the Probability output type almost always yields best results (again, if you have a continuous variable), but there are a bunch of options available.
... View more
09-22-2020
07:37 AM
|
0
|
1
|
2243
|
|
POST
|
Hi Suzanne, I can try to help you with the problems you're having in EBK3D. First, a quick clarification for something earlier. EBK3D uses the geometry of the features for the calculations, so it is not affected by visualization settings in the scene like vertical exaggeration. As for why you are seeing the error about not having enough data, I can see in your picture that you have plenty of points (only 10 valid values are required). Only two things come to mind that could cause this error. Accidentally leaving on a selection or definition query Providing an input field or elevation field where almost all values are null (both fields must have non-null values for a point to be included in the model). Note that the elevation field is used to specify the z-coordinates of the features, and it defaults to Shape.Z if the points are z-enabled. Please let me know if neither of these resolves the problem. A screenshot of the attribute table of the points and a screenshot of the first page of the Geostatistical Wizard with parameters filled out might help me identify the problem. Thanks, Eric
... View more
09-22-2020
07:19 AM
|
2
|
0
|
5778
|
|
POST
|
Also, please let me know if you determined whether a point is sand/clay by using a threshold for a continuous variable. For example, using a density cutoff to classify between sand and clay. If that's how you created your binary variable, there are a few other methodologies that will likely give more reliable results.
... View more
09-18-2020
10:08 AM
|
0
|
3
|
2243
|
|
POST
|
Hi Suzanne, In Ordinary (not Indicator) Kriging, the semivariogram is interpreted as the average squared difference in the values of two points, given their distance apart. Indicator Kriging, however, is slightly different. The semivariogram in this case is the average squared difference in the indicator values (0 or 1) of two points, given how far apart they are. When the semivariogram value is smaller, this means that the two points more likely to have the same indicator value. The output is a surface reflecting the probability that a location has an indicator value of 1 (in your case, the probability that the location is clay). Your second screenshot shows the cross-validation page for indicator kriging. For each input point, the indicator value is hidden, and the probability that the indicator value of the location is 1 is calculated based on neighbors. The graph then shows the true indicator value (Measured) plotted against the predicted probability that the point has indicator value equal to 1. Ideally, points with 0 for the measured value should have low predicted probabilities, and points with 1 for the measured value should have high probabilities. Unfortunately, that's not quite what I'm seeing in your particular graph. Points whose true value was 0 (ie, sand points) still have reasonably high predicted probabilities of being clay. For a few of the points, the indicator kriging model actually thought they were more likely to be clay than sand (these are the red points on the left and above y=0.5). For the points that actually were clay, the indicator kriging model predicted that they were more likely to be sand than clay (notice that the points on the right all fall below y=0.5). This can all be a bit confusing, but Indicator Kriging isn't really fundamentally different than the other types of kriging; it just transforms the data into indicator variables first and then proceeds identically. -Eric
... View more
09-18-2020
10:05 AM
|
0
|
0
|
2243
|
|
POST
|
Whenever kriging "predicts" at a new location, it calculates both a kriging prediction and a kriging variance (the standard error is the square root of the variance). The formulas for these values are complicated, but they are based on the semivariogram and the locations/values of neighboring points. You can see these formulas in the pdf, but they are not simple to explain and require understanding of linear algebra and Lagrange multipliers. When a point is being cross-validated, its value is hidden, and the kriging prediction and kriging variance at the location of the hidden point is calculated based on every other point. We call these the cross-validated prediction and cross-validated variance of the point. The "Predicted" field of the table represents the cross-validated prediction, and the "Standard Error" field represents the square root of the cross-validated variance. The "Measured" field is the true value of the hidden point, and the "Error" field is the difference between the Predicted and Measured. Once these statistics are calculated for every point by sequentially hiding each of them, the summary statistics like Average Standard Error can be calculated. The Mean Error, for example, is the simple average of the Error column. The Average Standard Error is the square root of the average of the cross-validated kriging variances. To calculate it by hand, square all of the values of the Standard Error field, take the average of the squared values, and take the square root of the result. You can see the formulas for all of these summary statistics in my first post. In the formulas, z(s) refers to the measured value at a location. z^(s) refers to the cross-validated prediction (the "hat" ^ appears above z(s)). σ-hat refers to the cross-validated standard error (the square root of the variance). The actual computation of these numbers is less important than what they're measuring. The RMSE and Average Standard Error are important because they both estimate uncertainty of predicted values in different ways, and if the uncertainty is being estimated consistently, both numbers should be approximately equal. If they differ significantly, this is a sign that uncertainties are not being modeled effectively. Similarly, the standardized RMS directly measures whether standard errors are being under- or over-estimated.
... View more
09-15-2020
01:16 PM
|
0
|
0
|
5737
|
|
POST
|
To reproduce the cross-validation statistics for a single point: Create a Kriging geostatistical layer in the Geostatistical Wizard using all points (you probably have already done this). On the points that were used to create geostatistical layer, deselect the point that you want to cross-validate. In other words, every other point in the dataset should be selected. Use the Create Geostatsitical Layer geoprocessing tool. Provide the geostatistical layer and the points with the selection into the tool. The output will be a new geostatistical layer that used the kriging parameters from the first layer and applied them to the dataset with the selection. Invert the selection so that only the point you want to cross-validate is selected. Use the GA Layer To Points geoprocessing tool. Use the geostatistical layer created in step 3, and predict to the single selected point. Provide the field containing the measured value. The output will contain all of the cross-validation statistics for the single feature. To calculate the summary statistics like RMSE, you'll need to do this for every feature in the dataset, then plug the values into the formulas in that PDF. I highly suggest that you don't try to do this manually. The Cross Validation geoprocessing tool was created exactly for this purpose.
... View more
09-15-2020
08:59 AM
|
0
|
0
|
5737
|
|
POST
|
Hello, Below are the exact formulas for the cross validation statistics, taken from page 279 this pdf: https://dusk.geo.orst.edu/gis/geostat_analyst.pdf When cross-validating a point, the remaining points produce a cross-validation prediction and standard error. All cross-validation statistics and summary statistics are based on these two numbers, along with the measured value at the point. Keep in mind that the cross-validated prediction and standard error of a point are not the same as the final interpolated prediction and standard error at that point. The former is calculated by removing the point, and the latter is calculated by including it. This is likely why your calculations don't match the numbers in the Geostatistical Wizard. Trying to reproduce the cross-validation results on your own is not simple. If you would like, I can explain how to do it for a single point. -Eric Note: There is actually a typo in the Average kriging standard error formula below. The sigma-hat inside the sum should be squared. Though it's called "Average kriging standard error," it would probably be more technically correct to call it the "Root mean cross validation variance."
... View more
09-15-2020
07:44 AM
|
0
|
0
|
5737
|
|
POST
|
Hi Nazia, In Geographically Weighted Regression, the explanatory variables can be categorical. As long as your dependent variable is continuous, it is fine to use categorical variables as explanatory variables. However, to use a categorical variable appropriately, you can't just assign values 1 through 8 to it. As you found, GWR will try to use these actual numbers, and you will get very different results depending on which levels you label as 1 through 8. For GWR to work properly with your categorical variable, you need to convert it to several indicator variables (variables that have the value 0 or 1) and then use these indicator variables as explanatory variables in GWR. The process of converting categorical variables to indicator variables is called "dummy encoding." Here is a good article about how to perform dummy encoding: Dummy variable (statistics) - Wikiversity In your case, your categorical variable has 8 levels, so you will need to make 7 indicator variables to represent the different levels (you always use one less indicator variable than the number of levels of the category). You'll need to make 7 new fields on your feature class. For the first field, each feature that is in the first level of the category gets the value 1, and features in any other level get the value 0 (we say that the 1 "indicates" that the feature is in that level). Similarly, in the second field, the features of the second level get a 1, and all other features get a 0. Same for levels 3 through 7. For level 8, the value 0 should go in all 7 fields. When you encode this way, it does not matter which levels of the category is called the first, second, etc level of the category. Changing the order will produce the same results in GWR. Please let me know if you have any other questions or have any problems encoding your variable. -Eric Krause
... View more
07-28-2020
08:21 AM
|
2
|
3
|
6157
|
|
POST
|
Hi Lucas, That particular error is encountered if all of the data of the field have the same value (or very, very close to the same value). Is it possible that one of the fields had all the same value? However, regardless, there was a known problem with using batch geoprocessing on the field of Empirical Bayesian Kriging, so I would not recommend trying to automate that way. This has been fixed for the upcoming ArcGIS Pro 2.6. In the meantime, I would suggest trying to write a Python script to automate the interpolation, if you're comfortable using arcpy. -Eric Krause
... View more
06-30-2020
07:27 AM
|
0
|
1
|
1487
|
|
POST
|
Hi Lucas, The easiest way to do this is the Cross Validation geoprocessing tool. It takes a geostatistical layer as input, and it creates a derived output with all of the summary statistics as properties. Quick Python code example: outCV = arcpy.ga.CrossValidation('myGALayer') outCV.rootMeanSquare '294.63006397102004'
... View more
06-15-2020
01:40 PM
|
0
|
0
|
1100
|
|
POST
|
Hi Bankim, Please see my response here: https://community.esri.com/message/925010-re-diffusion-interpolation-with-barriers-extent Using the Create Geostatistical Layer tool will allow you to change the Extent of any geostatistical layer. That post was about Diffusion Interpolation With Barriers, but the same thing works with every type of Kriging. -Eric
... View more
05-01-2020
02:47 PM
|
0
|
1
|
1337
|
|
POST
|
Using a large radius will rarely make the results worse. In fact, if you were to look at the kriging equations in a textbook, you might not see any neighborhood mentioned at all. In theory, the semivariogram defines the weights for every feature in the dataset. It's just that these weights tend to be very close to 0 when the points are further than the the semivariogram range. Including them or not including them will make almost no difference in the results. Instead of accuracy, the real purpose of neighborhoods is for calculation speed. By using neighborhoods, you get results in seconds, when it might take hours to calculate weights for the entire dataset, and the resulting surface would be nearly identical.
... View more
05-01-2020
02:39 PM
|
1
|
1
|
1584
|
|
POST
|
Using different parameters for the neighborhood and the semivariogram will not nullify the effect of the semivariogram. The semivariogram has one set of parameters, and the search neighborhood has a different set. The logic is that the semivariogram defines correlation based on distance, and it allows you to compute optimal weights for any set of neighbors. Once you define the neighborhood, the semivariogram is applied to the neighborhood to produce the predictions. It usually makes most sense to match the parameters of the neighborhood to the parameters of the semivariogram, but there's no requirement to do this. The reasoning behind keeping the parameters the same is that, for example, the Range in the semivariogram defines the maximum distance where points are still spatially correlated. Points further apart than this distance are considered spatially uncorrelated. When building the neighborhood, then, it makes sense to only use neighbors that are closer than the Range so that you are only including neighbors with meaningful spatial correlation in the neighborhood.
... View more
05-01-2020
07:08 AM
|
1
|
3
|
1584
|
|
POST
|
The regression line in cross validation excludes some of the extreme values when fitting the line. This is why the line differs from Excel. Sorry that I had forgotten to mention this earlier. From the help: "This procedure first fits a standard linear regression line to the scatterplot. Next, any points that are more than two standard deviations above or below the regression line are removed, and a new regression equation is calculated. This procedure ensures that a few outliers will not corrupt the entire regression equation." Regarding whether to do validation or cross validation, you have a few choices. Validation is the most statistically defensible methodology (because it validates against data that was completely withheld), but it requires not using some of your data. Cross validation, on the other hand, uses all data to build the model, but it then validates against the same data used to build the model, so there is a bit of data double-dipping. The double-dipping isn't usually a problem because the influence of any individual point should not be too extreme. The third option is to do a validation workflow to decide the parameters of the model. You can then apply this model to the entire data. To do this, perform the entire validation workflow. Then use the Create Geostatistical Layer tool, and provide the geostatistical layer used for validation and the entire dataset. This will apply the parameters of the validation model to all data. If, as you say, you're going to choose your model by cross validation statistics, then I would probably just do cross validation and not do a full validation workflow. But it's up to you.
... View more
04-29-2020
06:33 AM
|
1
|
1
|
2204
|
|
POST
|
This is unfortunately the only edition of the book.
... View more
04-29-2020
06:25 AM
|
1
|
0
|
10861
|
| Title | Kudos | Posted |
|---|---|---|
| 2 | 01-16-2025 04:52 AM | |
| 1 | 10-02-2024 06:45 AM | |
| 2 | 08-23-2024 09:18 AM | |
| 1 | 07-19-2024 07:09 AM | |
| 1 | 08-21-2012 09:47 AM |
| Online Status |
Offline
|
| Date Last Visited |
a week ago
|