|
POST
|
Hi again, I have a lot to say but unfortunately not a lot of time right now. I'm going to try to cover the most important things, sorry for quickly jumping between topics. The GA Layer 3D To NetCDF tool is used to make the source file for the Voxel layer. This tool and the voxel layer itself are both new in ArcGIS Pro 2.6, which has only been available for a few months. If you aren't seeing the option to add a voxel layer in Add Data in a scene view, you probably don't have the most recent version. If you have ArcGIS Pro 2.5, you can use the GA Layer 3D To Multidimensional Raster tool to export the EBK3D layers directly to a multidimensional CRF raster. I see in your image of cross-validation that many of the points have identical (or very close to identical) values. These are the horizontal lines of red points in the graph. Repeated values can be problematic for the Empirical transformation, especially with large gaps between the repeated values (this can be seen in the histogram). The empirical transformation is essentially trying to fit a smooth curve to the histogram, then uses this curve as a reference to the normal distribution. However, if the histogram isn't smooth, the curve won't fit the histogram well, and it will likely give strange results in the gaps between the peaks in the histogram. Regarding multivariate normality for the quantile output, this is very hard to explain without giving a 2 hour statistics lecture, so I'll try to keep it short. Kriging of all types is designed to directly predict the mean and standard error of the true value at a new location. However, you need a full distribution to estimate quantiles, and the mean and the standard error are not enough to fully reconstruct a distribution. I could show you many different datasets that all have the same mean and standard error, but they would all have very different quantile values. So, to calculate quantiles, you have to make an assumption about the predictive distribution, which is almost always a normality assumption. The reason for this is that if your input data are normally distributed (or transformed to be normal), then the kriging predictions will also be normally distributed. This is why you check for normality in your input data first so that you don't have to worry about it later. In practice, you do the best that you can to transform the data to be closer to normal, but that is going to be difficult to do for a histogram with repeated values and gaps. If it were me, I would probably try different EBK3D parameters like subset size and overlap factor and hope that you land on a particular subsetting where the model is most stable. I would also experiment with not using a transformation at all. To judge how well a particular model is working, I would focus on cross-validation, specifically the RMSE and the Inside 90/95 Percent Interval. The RMSE is approximately equal to the average difference between the true and predicted value, so it's a quick test to see if a model is too inaccurate to be viable. The two Inside 90/95 statistics build confidence intervals using the normality assumption. If the values aren't too far off from 90 and 95, it may be safe to assume normality to estimate quantiles. -Eric
... View more
09-23-2020
08:17 AM
|
3
|
0
|
4454
|
|
POST
|
Hi Blair, A barrier in geostatistics usually means a line where the values of variables change instantly. Fault lines, cliffs, and shorelines are common barriers. Unfortunately, Empirical Bayesian Kriging 3D does not allow barriers of this kind; the model assumes that the values change gradually without any discontinuities. Two interpolation methods, Kernel Interpolation With Barriers and Diffusion Interpolation With Barriers, allow barriers to be used, but they are both 2D interpolation methods. -Eric
... View more
09-22-2020
12:16 PM
|
1
|
0
|
1553
|
|
POST
|
It may not work with your data, but the idea I had was to Empirical Bayesian Kriging with a Probability map output. If you had a continuous variable with a threshold indicating the cutoff between sand and clay, you could just interpolate the continuous variable with EBK and calculate the probability that each new location exceeds that threshold using the predicted value and the standard error. The Probability output type does this calculation automatically. The biggest problem with Indicator Kriging is that when you reclassify from a continuous variable to a binary variable, you lose lots of information. This is why Indicator Kriging usually doesn't perform well. But if you only have indicator variables to start out and didn't create them with a reclassification, indicator kriging is about the best you can do. And just to add to the complexity, there's another type of kriging called Probability Kriging. It takes a continuous variable and a threshold. It then reclassifies the values to 0 and 1 and performs indicator kriging. However, it also uses the raw values of the continuous variable as a cokriging variable. This essentially tries to bring back in some of the information that was lost during the reclassification. In my experience, just using EBK with the Probability output type almost always yields best results (again, if you have a continuous variable), but there are a bunch of options available.
... View more
09-22-2020
07:37 AM
|
0
|
1
|
3002
|
|
POST
|
Hi Suzanne, I can try to help you with the problems you're having in EBK3D. First, a quick clarification for something earlier. EBK3D uses the geometry of the features for the calculations, so it is not affected by visualization settings in the scene like vertical exaggeration. As for why you are seeing the error about not having enough data, I can see in your picture that you have plenty of points (only 10 valid values are required). Only two things come to mind that could cause this error. Accidentally leaving on a selection or definition query Providing an input field or elevation field where almost all values are null (both fields must have non-null values for a point to be included in the model). Note that the elevation field is used to specify the z-coordinates of the features, and it defaults to Shape.Z if the points are z-enabled. Please let me know if neither of these resolves the problem. A screenshot of the attribute table of the points and a screenshot of the first page of the Geostatistical Wizard with parameters filled out might help me identify the problem. Thanks, Eric
... View more
09-22-2020
07:19 AM
|
2
|
0
|
6686
|
|
POST
|
Also, please let me know if you determined whether a point is sand/clay by using a threshold for a continuous variable. For example, using a density cutoff to classify between sand and clay. If that's how you created your binary variable, there are a few other methodologies that will likely give more reliable results.
... View more
09-18-2020
10:08 AM
|
0
|
3
|
3002
|
|
POST
|
Hi Suzanne, In Ordinary (not Indicator) Kriging, the semivariogram is interpreted as the average squared difference in the values of two points, given their distance apart. Indicator Kriging, however, is slightly different. The semivariogram in this case is the average squared difference in the indicator values (0 or 1) of two points, given how far apart they are. When the semivariogram value is smaller, this means that the two points more likely to have the same indicator value. The output is a surface reflecting the probability that a location has an indicator value of 1 (in your case, the probability that the location is clay). Your second screenshot shows the cross-validation page for indicator kriging. For each input point, the indicator value is hidden, and the probability that the indicator value of the location is 1 is calculated based on neighbors. The graph then shows the true indicator value (Measured) plotted against the predicted probability that the point has indicator value equal to 1. Ideally, points with 0 for the measured value should have low predicted probabilities, and points with 1 for the measured value should have high probabilities. Unfortunately, that's not quite what I'm seeing in your particular graph. Points whose true value was 0 (ie, sand points) still have reasonably high predicted probabilities of being clay. For a few of the points, the indicator kriging model actually thought they were more likely to be clay than sand (these are the red points on the left and above y=0.5). For the points that actually were clay, the indicator kriging model predicted that they were more likely to be sand than clay (notice that the points on the right all fall below y=0.5). This can all be a bit confusing, but Indicator Kriging isn't really fundamentally different than the other types of kriging; it just transforms the data into indicator variables first and then proceeds identically. -Eric
... View more
09-18-2020
10:05 AM
|
0
|
0
|
3002
|
|
POST
|
Whenever kriging "predicts" at a new location, it calculates both a kriging prediction and a kriging variance (the standard error is the square root of the variance). The formulas for these values are complicated, but they are based on the semivariogram and the locations/values of neighboring points. You can see these formulas in the pdf, but they are not simple to explain and require understanding of linear algebra and Lagrange multipliers. When a point is being cross-validated, its value is hidden, and the kriging prediction and kriging variance at the location of the hidden point is calculated based on every other point. We call these the cross-validated prediction and cross-validated variance of the point. The "Predicted" field of the table represents the cross-validated prediction, and the "Standard Error" field represents the square root of the cross-validated variance. The "Measured" field is the true value of the hidden point, and the "Error" field is the difference between the Predicted and Measured. Once these statistics are calculated for every point by sequentially hiding each of them, the summary statistics like Average Standard Error can be calculated. The Mean Error, for example, is the simple average of the Error column. The Average Standard Error is the square root of the average of the cross-validated kriging variances. To calculate it by hand, square all of the values of the Standard Error field, take the average of the squared values, and take the square root of the result. You can see the formulas for all of these summary statistics in my first post. In the formulas, z(s) refers to the measured value at a location. z^(s) refers to the cross-validated prediction (the "hat" ^ appears above z(s)). σ-hat refers to the cross-validated standard error (the square root of the variance). The actual computation of these numbers is less important than what they're measuring. The RMSE and Average Standard Error are important because they both estimate uncertainty of predicted values in different ways, and if the uncertainty is being estimated consistently, both numbers should be approximately equal. If they differ significantly, this is a sign that uncertainties are not being modeled effectively. Similarly, the standardized RMS directly measures whether standard errors are being under- or over-estimated.
... View more
09-15-2020
01:16 PM
|
0
|
0
|
6991
|
|
POST
|
To reproduce the cross-validation statistics for a single point: Create a Kriging geostatistical layer in the Geostatistical Wizard using all points (you probably have already done this). On the points that were used to create geostatistical layer, deselect the point that you want to cross-validate. In other words, every other point in the dataset should be selected. Use the Create Geostatsitical Layer geoprocessing tool. Provide the geostatistical layer and the points with the selection into the tool. The output will be a new geostatistical layer that used the kriging parameters from the first layer and applied them to the dataset with the selection. Invert the selection so that only the point you want to cross-validate is selected. Use the GA Layer To Points geoprocessing tool. Use the geostatistical layer created in step 3, and predict to the single selected point. Provide the field containing the measured value. The output will contain all of the cross-validation statistics for the single feature. To calculate the summary statistics like RMSE, you'll need to do this for every feature in the dataset, then plug the values into the formulas in that PDF. I highly suggest that you don't try to do this manually. The Cross Validation geoprocessing tool was created exactly for this purpose.
... View more
09-15-2020
08:59 AM
|
0
|
0
|
6991
|
|
POST
|
Hello, Below are the exact formulas for the cross validation statistics, taken from page 279 this pdf: https://dusk.geo.orst.edu/gis/geostat_analyst.pdf When cross-validating a point, the remaining points produce a cross-validation prediction and standard error. All cross-validation statistics and summary statistics are based on these two numbers, along with the measured value at the point. Keep in mind that the cross-validated prediction and standard error of a point are not the same as the final interpolated prediction and standard error at that point. The former is calculated by removing the point, and the latter is calculated by including it. This is likely why your calculations don't match the numbers in the Geostatistical Wizard. Trying to reproduce the cross-validation results on your own is not simple. If you would like, I can explain how to do it for a single point. -Eric Note: There is actually a typo in the Average kriging standard error formula below. The sigma-hat inside the sum should be squared. Though it's called "Average kriging standard error," it would probably be more technically correct to call it the "Root mean cross validation variance."
... View more
09-15-2020
07:44 AM
|
0
|
0
|
6991
|
|
POST
|
Hi Nazia, In Geographically Weighted Regression, the explanatory variables can be categorical. As long as your dependent variable is continuous, it is fine to use categorical variables as explanatory variables. However, to use a categorical variable appropriately, you can't just assign values 1 through 8 to it. As you found, GWR will try to use these actual numbers, and you will get very different results depending on which levels you label as 1 through 8. For GWR to work properly with your categorical variable, you need to convert it to several indicator variables (variables that have the value 0 or 1) and then use these indicator variables as explanatory variables in GWR. The process of converting categorical variables to indicator variables is called "dummy encoding." Here is a good article about how to perform dummy encoding: Dummy variable (statistics) - Wikiversity In your case, your categorical variable has 8 levels, so you will need to make 7 indicator variables to represent the different levels (you always use one less indicator variable than the number of levels of the category). You'll need to make 7 new fields on your feature class. For the first field, each feature that is in the first level of the category gets the value 1, and features in any other level get the value 0 (we say that the 1 "indicates" that the feature is in that level). Similarly, in the second field, the features of the second level get a 1, and all other features get a 0. Same for levels 3 through 7. For level 8, the value 0 should go in all 7 fields. When you encode this way, it does not matter which levels of the category is called the first, second, etc level of the category. Changing the order will produce the same results in GWR. Please let me know if you have any other questions or have any problems encoding your variable. -Eric Krause
... View more
07-28-2020
08:21 AM
|
2
|
3
|
6671
|
|
POST
|
Hi Lucas, That particular error is encountered if all of the data of the field have the same value (or very, very close to the same value). Is it possible that one of the fields had all the same value? However, regardless, there was a known problem with using batch geoprocessing on the field of Empirical Bayesian Kriging, so I would not recommend trying to automate that way. This has been fixed for the upcoming ArcGIS Pro 2.6. In the meantime, I would suggest trying to write a Python script to automate the interpolation, if you're comfortable using arcpy. -Eric Krause
... View more
06-30-2020
07:27 AM
|
0
|
1
|
1730
|
|
POST
|
Hi Lucas, The easiest way to do this is the Cross Validation geoprocessing tool. It takes a geostatistical layer as input, and it creates a derived output with all of the summary statistics as properties. Quick Python code example: outCV = arcpy.ga.CrossValidation('myGALayer') outCV.rootMeanSquare '294.63006397102004'
... View more
06-15-2020
01:40 PM
|
0
|
0
|
1313
|
|
POST
|
Hi Bankim, Please see my response here: https://community.esri.com/message/925010-re-diffusion-interpolation-with-barriers-extent Using the Create Geostatistical Layer tool will allow you to change the Extent of any geostatistical layer. That post was about Diffusion Interpolation With Barriers, but the same thing works with every type of Kriging. -Eric
... View more
05-01-2020
02:47 PM
|
0
|
1
|
1644
|
|
POST
|
Using a large radius will rarely make the results worse. In fact, if you were to look at the kriging equations in a textbook, you might not see any neighborhood mentioned at all. In theory, the semivariogram defines the weights for every feature in the dataset. It's just that these weights tend to be very close to 0 when the points are further than the the semivariogram range. Including them or not including them will make almost no difference in the results. Instead of accuracy, the real purpose of neighborhoods is for calculation speed. By using neighborhoods, you get results in seconds, when it might take hours to calculate weights for the entire dataset, and the resulting surface would be nearly identical.
... View more
05-01-2020
02:39 PM
|
1
|
1
|
2241
|
|
POST
|
Using different parameters for the neighborhood and the semivariogram will not nullify the effect of the semivariogram. The semivariogram has one set of parameters, and the search neighborhood has a different set. The logic is that the semivariogram defines correlation based on distance, and it allows you to compute optimal weights for any set of neighbors. Once you define the neighborhood, the semivariogram is applied to the neighborhood to produce the predictions. It usually makes most sense to match the parameters of the neighborhood to the parameters of the semivariogram, but there's no requirement to do this. The reasoning behind keeping the parameters the same is that, for example, the Range in the semivariogram defines the maximum distance where points are still spatially correlated. Points further apart than this distance are considered spatially uncorrelated. When building the neighborhood, then, it makes sense to only use neighbors that are closer than the Range so that you are only including neighbors with meaningful spatial correlation in the neighborhood.
... View more
05-01-2020
07:08 AM
|
1
|
3
|
2241
|
| Title | Kudos | Posted |
|---|---|---|
| 2 | 01-16-2025 04:52 AM | |
| 1 | 10-02-2024 06:45 AM | |
| 2 | 08-23-2024 09:18 AM | |
| 1 | 07-19-2024 07:09 AM | |
| 1 | 08-21-2012 09:47 AM |
| Online Status |
Offline
|
| Date Last Visited |
02-25-2026
06:39 PM
|