Latest Contributions by EricKrause

‎03-08-2022

@Elijah Each horizontal line of points in the graph appears to have the same Measured value on the y-axis (or very close to equal). However, they each have a different Predicted value on the x-axis. Having many repeated values in the field you used to interpolate would produce this kind of graph. This isn't necessarily a problem, but you should look into it and confirm that the repeated values are expected in your data.

‎01-26-2022

@ttgrdias Unless you have a physical reason to think the EIF is some specific value (say, a value estimated from ocean or wind currents), I would personally let the software estimate the inflation factor. Regarding configuring parameters to improve cross validation results, any parameter (search neighborhood or otherwise) can potentially improve the results. In my experience (and this is just a general statement), as long as you use at least 10-15 neighbors total (EBK3D uses between 12 and 24 neighbors by default), you won't see a lot of improvement by including more neighbors in the search neighborhood. Again, in my experience, spending extra time configuring the values of the Subset size, Order of trend removal, and Elevation inflation factor parameters often provide the best model improvement for 3D interpolation.

‎01-26-2022

@ttgrdias When you optimize the EIF, it will use the value that minimizes the Root-Mean-Square cross validation error (RMSE), keeping all other parameters fixed. In other words, this is the inflation factor that allows the model to most accurately predict back to the input point locations. Since the optimization is data-driven and only minimizes a single number (the RMSE), it can be sensitive to things like outliers, value distributions, and spatial configurations of the points. If these properties are not consistent across all of your datasets, you should generally expect them to estimate different EIF values.

‎01-06-2022

Hi @ArnieWaddell1 The graph is showing the distribution each field in the cross validation Table (change the field with the Field pulldown) using a kernel density estimation. Think of it like a smoothed histogram of the Measured values, Predicted values, or Errors. Overlaying the Measured and Predicted fields is the most important use of the Distribution tab. This lets you compare the distribution of true values and the cross validated predictions. Ideally, the two should have very similar distributions, and big deviations might indicate problems in the interpolation model. -Eric

‎12-08-2021

Hi @Alexandra_Br, please try the links again. They were using "http" addresses, and I have updated them to "https". Please let me know if you still have any issues accessing them.

‎11-29-2021

This seems like a pretty reasonable study outline to me. If I had to nitpick, I would suggest simulating data from the interpolation model rather than extracting the results of a single interpolation. This allows you to do multiple repetitions for each gridding, where the data for each repetition comes from the same model and differs only in their random components. That being said, in Geostatistical Analyst, you can only simulate from univariate Simple Kriging models, so that might not be an option. I'm definitely interested in the results, I've done some ad hoc experimentation along these lines but not enough to draw any solid conclusions.

‎11-17-2021

Hi Elijah, I'm not aware if this has been specifically researched by anyone, but I have a few thoughts about EBK with low and varying sampling density. For the same subset size, the area of a subset will be larger for a low-density area than a high-density area (ie, 100 points would be spread out over a larger area for lower densities). EBK assumes stationarity within each subset, so to get best results, you should configure the subset size so that the resulting subsets are small enough in area to assume each is stationarity. EBK (when a transformation is used) and EBKRP are based on Simple Kriging, so they assume a uniform density within each subset. It's ok if the density of the samples varies globally (ie, some areas are densely sampled, and others aren't), but it's important that the density of the points be relatively consistent within each subset. Again, this may require configuring the subset size. EBK Regression Prediction, however, has a clear advantage for low density sampling, but it's the regression part, not the EBK part. The better the regression model predicts the dependent variable on its own, the less need there is for including autocorrelation from neighbors with EBK. Hope that's all clear, Eric

‎11-15-2021

Hi @Elijah, "Small" is of course relative, but this generally means datasets between ~20 and 100 input points. Most other kriging methods estimate a single semivariogram for the entire study area, and this presents two problems. First, it is difficult to estimate semivariogram for small datasets, and the uncertainty in the semivariogram model will not be propagated to the prediction standard errors. Second, one semivariogram (even if correctly estimated) may not fit well everywhere in the study area; some areas may need different semivariograms than other areas. By simulating data and semivariograms locally, EBK/EBKRP better captures uncertainty in the semivariogram model, and it allows the models to change within the study area. This results in, among other things, better "coverage probabilities" for EBK. For example, when creating 95% confidence intervals for kriging (not EBK) predictions, it is common for only 75% of validation data to fall within the confidence interval because the standard errors do not accurately reflect the prediction uncertainties. But you should generally expect confidence intervals from EBK to be much closer to 95%. -Eric

‎11-01-2021

Hi @Elijah , The tool is estimating the effect of each explanatory variable with a regression-kriging equation. Each explanatory variable comes with a coefficient that indicates the expected change of the dependent variable for a one-unit increase in the explanatory variable. For distance to roads, it is trying to estimate the effect on CO2 by moving one distance unit (maybe 1 meter) further from a road. This can become a problem because if all of your data are sampled on or near a road, all of the distances used to estimate the coefficient come from a narrow range of distances, likely all under 10 meters. But when you are predicting, you are predicting to areas relatively far from a road. The patterns that the tool detected in the narrow set of distances likely will not hold up for larger distances. Generally speaking, you should be cautious when extrapolating outside the range of the explanatory variables of the input points. Specifically, if all of your points are near a road, I would not use distance to roads as an explanatory variable because there isn't enough variation to reliably estimate and extrapolate the effect. To do this, you would need samples of points at varying distances from roads. Hope that helps, Eric PS, I'm also concerned that the distances to roads you're providing aren't accurate. If all points are truly on the road, that distance should always be 0. I'm wondering if the distance to roads of the input points are just an artifact of the cells of the Euclidean Distance raster not landing exactly on the road. If so, the "distance" of each point will be somewhat random and not correspond to any correlation between roads and CO2.

‎10-21-2021

@Bill_Thayer No, I'm sorry, but this information cannot be extracted. One thing to know, however, is that the semivariogram is estimated using no more than 1000 input features. If the input features have more than 1000 valid records, a random sample of 1000 is taken, and only these features are binned for semivariogram estimation*. So even if you could extract the actual number of binned pairs, it would not match the number you expect, unless you have fewer than 1000 points. If this information is something that would help you in your work or research, your best bet is submit the idea through ArcGIS Ideas. *After estimating the semivariogram from the random sample, all input points are used when making predictions, not just the 1000 point sample.

‎09-27-2021

Hi Christina, There is a lot going on with that equation, probably too much to summarize in a post. The idea is that the red dots and blue crosses are estimated directly from the data, and the smooth blue semivariogram model is fit to this data (similar to fitting a regression line to a scatterplot). The equation under the curve is the equation for the smooth blue line. Some of the formulas used in Geostatistical Analyst are slightly different than used in 3D Analyst and Spatial Analyst (the help page you linked to). You can find the exact formulas used in Geostatistical Analyst on page 263 of this document: https://dusk.geo.orst.edu/gis/geostat_analyst.pdf However, I would highly suggest that you project your data to a projected coordinate system and perform kriging on the projected data. In general, you cannot convert semivariograms calculated in one coordinate system to another coordinate system. This problem is especially bad for data with latitude-longitude coordinates because there is no standard conversion between degrees and other linear units like kilometers. Hope that helps, -Eric

‎09-24-2021

@Matteroffact Yes, that is what you should expect to see with the geostatistical layer. However, think of the geostatistical layer as more of a model source and visualization. It draws itself within the rectangular extent of the points (oriented by the coordinate system), but the results of the model are defined everywhere, and they can be exported within and/or outside of drawing rectangle.

‎09-24-2021

@Matteroffact I may not be understanding correctly, but I don't think it is possible to do this with the 3D geostatistical layer. The layer will always be a full cube oriented by the coordinate system. However, there are many more options when exporting the results. GA Layer 3D To NetCDF tool (used to prepare a voxel layer) has an "Input study area polygons" parameter that lets you define a non-rectangular study are for the export. GA Layer To Multidimensional Raster tool supports the Mask environment, which can achieve a similar result.

‎09-21-2021

Hi Cristina, Thank you for your question! Indeed, the displayed units of the x- and y-axes of the Geostatistical Wizard changed between ArcMap and ArcGIS Pro (specifically, all exponents changed signs). However, they are both displaying the same information, just using different syntax to display the information. In ArcGIS Pro, the value in the parentheses is the unit of the axis. For example, in your semivariogram, a value of 1 on the x-axis means 0.1 (10^-1) decimal degrees as the distance. On the y-axis, a value of 1 means a semivariance of 100000 (10^5). In ArcMap, the axes are labeled as "Distance (Unit), h * 10^x". You should read this as, "The true distance h, when multiplied by 10^x, results in the displayed value on the axis." For example, with your x-axis, if the true distance were 0.1 decimal degrees, you would multiply it by 10 to get a displayed value of 1 on the axis. This is equivalent to saying, "Multiply the displayed value by 10^-x to get the true value." While the syntax of the axes used in ArcMap was common in geostatistical research papers, many users (myself included) found this very confusing and would often interpret the magnitude backwards. We decided to change this in ArcGIS Pro to instead display a more traditional unit. -Eric

‎09-08-2021

@David_Brooks Yes, it is possible to configure EBK3D to be a nearest neighbor classifier where every prediction just takes the value of the closest input point. Please see A Workflow for Creating Discrete Voxels blog for instructions.

Online Status	Offline
Date Last Visited	a week ago

My Ideas

Latest Contributions by EricKrause

Re: Why is the 'measured versus the predicted' so systematically distributed as in the figure below?

Re: New in ArcGIS Pro 2.3: 3D Interpolation with Empirical Bayesian Kriging

Re: New in ArcGIS Pro 2.3: 3D Interpolation with Empirical Bayesian Kriging

Re: Cross Validation Distribution

Re: "Spatial Statistical Data Analysis for GIS Users" available free for download

Re: What could be wrong with interpolating values extracted from an interpolated surface (grid)

Re: Still on EBK/EBK Regression Prediction

Re: Still on EBK/EBK Regression Prediction

Re: How does EBKRP handle each explanatory raster layer in terms of influence in determining the values at an unsampled location?

Re: View/Output number of sample pairs in variogram

Re: Understanding the equation of the mathematical models used to describe the semivariogram in a kriging analysis

Re: New in ArcGIS Pro 2.3: 3D Interpolation with Empirical Bayesian Kriging

Re: New in ArcGIS Pro 2.3: 3D Interpolation with Empirical Bayesian Kriging

Re: Discrepancies in semivariogram graphs within kriging analysis using the same dataset and specifications in ArcMap and ArcGIS Pro

Re: New in ArcGIS Pro 2.3: 3D Interpolation with Empirical Bayesian Kriging

Re: Why exporting EBK 3D to voxel layer change org...

Re: Access geostatistical layer created using ArcP...

Re: Why exporting EBK 3D to voxel layer change org...

Re: K-Bessel Semivariogram Equation

Re: Kriging Model Types

R-ArcGIS