|
POST
|
Hi William, Is there any way you can provide your data? I know data is often confidential, but the only way this will get resolved is if we can run it in a debugger to get better error information. If possible, attaching a zip file to this thread will be easiest. If that is not acceptable, please let me know, and we can figure out a confidential way to figure this out.
... View more
07-05-2018
03:07 PM
|
0
|
5
|
5493
|
|
BLOG
|
It's almost time for the 2018 Esri User Conference (July 9-13). We here on the Geostatistical Analyst team are busy practicing and polishing our presentations, and we are looking forward to showing you what we’ve been working on since last summer. If you do any kind of sample data prediction for spatial data analysis and decision making, please look at the guide below to help you plan out your time with us. See the full agenda for more details on these sessions, and more. Be sure to download the Esri Events Mobile App too! Please come by the Island in the Showcase during the week (Tuesday and Wednesday from 9 – 6, and Thursday from 9 – 1:30). We’d love to hear about the work you do, any difficulties you encounter, and any ideas you might have to make our software suit your needs even better. Finally, remember to check out the Spatial Analysis: The Road Ahead session for a peek into the future.
... View more
06-29-2018
12:16 PM
|
0
|
0
|
798
|
|
POST
|
Hi Kaoutar, I see in the image that you only have 11 input points. I would not recommend that you attempt to use such a complicated model on such a small number of points. None of the graphs are particularly meaningful when your kriging model has nearly as many parameters as input points. Sorry for the bad news, but if you absolutely need to interpolate these points, you should use something very simple, such as Global Polynomial Interpolation or IDW. -Eric
... View more
06-19-2018
07:40 AM
|
2
|
1
|
1791
|
|
POST
|
I took a quick look at your images. For most of them, the crosscovariances are near 0 (or even negative) for short distances. This implies that there is little correlation between the variables. The idea is that for short distances, the variables should have high crosscovariance (indicating that they are correlated). This covariance should decrease as the distance increases until they level out near zero (which indicates that they are no longer correlated). You need to try to identify the distance where these covariances generally become zero. However, it isn't clear to me from those pictures if there is any crosscovariance between the variables at all. You may be able to better identify the distance you need in the Geostatistical Wizard. Perform cokriging, and look at the crosscovariance view on the semivariogram page. This graph won't have as many points as the cloud, and it will try to fit a covariance model automatically. Whatever Major Range is estimated should be a good estimate of the range of crosscovariance between the variables.
... View more
04-02-2018
12:14 PM
|
1
|
1
|
1410
|
|
POST
|
I was recently reminded that I never followed up on this thread. We were able to figure out what happened, and I actually learned something I didn't know about our tools. GA Layer to Points operates slightly differently depending on whether or not you provide a validation field. If you do not provide one, it is simply extracting the value of the geostatistical layer. If you provide a validation field, things get a little complicated. In geostatistics, there is a distinction between the true process and the measured process. We assume that each location does have a true value, but when you attempt to measure it, you introduce measurement error. We say that the measured value is the sum of the true value plus random noise. When you provide a validation field, that field is assumed to contain measured values, not true values. To validate against them correctly, you have to use the standard error of the measured process, which is always larger than the standard error of the true process. Things get extra complicated if you apply a transformation because then the predicted value actually depends on the standard error. In that case, both the predictions and the standard errors will be different depending on whether you provide a validation field. The issue is discussed thoroughly in this paper: Krivoruchko, K., A. Gribov, and J. M. Ver Hoef, 2006, "A new method for handling the nugget effect in kriging," T. C. Coburn, J. M. Yarus, and R. L. Chambers, Eds., Stochastic modeling and geostatistics: Principles, methods, and case studies, volume II: AAPG Computer Applications and Geology 5, p. 81–89.
... View more
03-23-2018
07:57 AM
|
0
|
0
|
2219
|
|
POST
|
Hi Dana, Your "outLayer" is a geostatistical layer. These are in_memory layers, and you will need to persist them somehow. If you ultimately want rasters, you should probably just bypass the geostatistical layer entirely. Instead of passing an empty string for "outRaster", instead just supply the file path, name, and file format where you want the raster. Then pass the "outLayer" as an empty string. This will create all of the physical rasters on disk. You can then import them as layers and apply any symbology that you want. -Eric Edit: If you really do want the geostatistical layers, you should add a step to save the geostatsitical layer to a layer file with the "Save to Layer File" geoprocessing tool.
... View more
03-16-2018
08:26 AM
|
1
|
0
|
3630
|
|
POST
|
Yes, that sounds like an ideal setup. In fact, that is exactly why measurement errors are passed as standard deviations. The most common source of measurement error is when the measured values aren't "measured" at all; instead, they are outputs or aggregations of some other model (like a long-term trend, in your case). These other models often calculate standard errors or standard deviations, and these can be propagated directly as measurement errors. As for the time of calculation, I assume the process has either completed or you canceled it already. The biggest contributors to the computation time are the number of points, the output cell size, and the number of simulations. You can't really change the number of points, so you can either reduce the number of realizations or increase the cell size to speed up the process. Increasing the cell size will make the output more pixelated, and reducing the number of simulations will reduce precision in the calculated predictions and standard deviations. I can't really give more specific recommendations than that without looking at your data.
... View more
03-02-2018
07:12 AM
|
0
|
1
|
5692
|
|
POST
|
The cell size will default to 1/250 of the width or height of the raster. You can freely change this value or just use the default. This just changes the resolution of the raster, so if you want high resolution in the pixels, you can make the value smaller. It's really up to you and doesn't have any impact on the geostatistics. About the measurement error, a little explanation is needed. The measurement error model that we use assumes that each location has some true, underlying value. However, when this true value is measured, there will be measurement error such that the measured value will not be identical to the true value. The model assumes that the measured value at a location is equal to the true value at the location, plus some random noise. This noise is assumed to follow a normal (Gaussian) distribution, where the mean of the normal distribution is equal to the true, underlying value. What you need to provide to the tool is the standard deviation of this normal distribution. The larger the standard deviation, the more noisy the measurement. Unfortunately, there is no way to really calculate this standard deviation for each location. You have to just know it somehow. This information sometimes can be found in the manufacturing documentation of whatever device you used to take the measurements, and sometimes it is known from past research or for physical reasons.
... View more
03-01-2018
01:19 PM
|
0
|
3
|
5692
|
|
POST
|
To incorporate heterogeneous measurement error (error that varies from point to point), you will need to use geostatistical simulations. Here is the outline of the workflow: Use the Geostatistical Wizard to interpolate the points using Simple Kriging (this will not work for other types of kriging). Do not worry about measurement error for now. Use the Gaussian Geostatistical Simulations geoprocessing tool with the following parameters: Input geostatistical layer - Provide the layer created in step 1. This will be the basis for the simulations. Number of realizations - Enter a large number, maybe 1000. This is how many simulations will be performed. Output workspace - Provide a directory or geodatabase to store the simulations. Output simulation prefix - Enter a 3-character prefix to label the simulations. Input conditioning features - Provide the feature class that you interpolated in step 1. Conditioning field - Provide the field that you interpolated in step 1. Conditioning measurement error field - This is where you provide the measurement error values. You will need to create a field on your conditioning features specifying the measurement error of each point. The value of the field must correspond to one standard deviation of the measurement error. If it isn't clear what I mean here, please ask. Output cell size - Provide the cell size of the raster that will be simulated. Raster statistics type - Check the boxes next to "Mean" and "Standard Deviation". The Mean raster will correspond to the kriging predictions, and the Standard Deviation raster will correspond to the standard errors of the predictions. What you should expect to see after running the tool is that the kriging predictions (Mean raster) will be very close to to the kriging predictions without specifying measurement error. The standard errors (Standard Deviation raster) will be larger than they were without specifying measurement error. They will be larger because the uncertainty in the input is propagated correctly to the uncertainty in the output. Let me know if you have any other questions or need any clarifications.
... View more
03-01-2018
12:19 PM
|
1
|
5
|
5692
|
|
POST
|
Hi Crystal, When there are uncertainties in the input data, we call that measurement error. Measurement error can be handled in several different ways, depending on your data. Do each of your measured values have the same measurement error (ie, is the uncertainty the same for every value), or does the measurement error change from point to point?
... View more
03-01-2018
09:39 AM
|
0
|
7
|
5692
|
|
BLOG
|
A new subsetting algorithm has been developed in Geostatistical Analyst for ArcGIS Pro 2.1, Generate Subset Polygons, as a geoprocessing tool. The purpose of this tool is to break down the spatial data into small, nonoverlapping subsets. The new tool is intended to be used to create Subset Polygons in EBK Regression Prediction and any future tools that allow you to define subsets using polygons. Why do we need a new subsetting algorithm? The current subsetting algorithm implemented in Empirical Bayesian Kriging (EBK) and EBK Regression Prediction often encounters problems with clustered data. For example, in the figure below, the current subsetting algorithm often combines data far away from each other into the same subset. This is not desirable because the subsets should be as compact as possible. Figure 1: Overlapping subsetting polygon using EBK Regression Prediction, where the blue dots are the point location of rainfall stations and the selected polygon (cyan) shows the overlapping nature. The data for this analysis have been taken from [1]. The new algorithm In the new algorithm, for each subset S i , the number of points n i satisfies the constraint min ≤ n i ≤max and minimizes the sum of the pairwise squared deviations within the subsets ∑ i ∑ x∈S i ∑ y∈S i ||x-y|| 2 . This could also be reorganized as the sum of weighted variances within the subsets 2∑ i n i ∑ x∈S i ||x- c i || 2 , where c i is the center of the subset S i . Note, this algorithm has a harsher penalty on the number of points in each subset as compared to the K-mean clustering. The new algorithm performs the following three steps: Step 1: Connects all points and form a closed curve. Step 2: Cuts the curve into subsets. Step 3: Finds all overlaps and resolves them. Step 1 The first step connects all points and forms a closed curve. To achieve this, we form a grid in the work space and put all points that are regarded as clusters into corresponding grid cells. Figure 2: Example of how the points are connected to form a closed curve. If two clusters are in the same cell, we merge them into one. A hash table is used to locate the clusters in the grid for efficient searching. We merge nearby clusters to form the curve. Initially, the cell size is extremely fine to ensure that points are connected locally. As the size of the cluster increases, the cell size also increases. The time complexity of this step is O(n · log n), where n is the total number of points. Step 2 The second step cuts the curve into subsets. While cutting the curve, we satisfy the constraint on the number of points in each subset and minimize the sum of the pairwise squared deviations within the subsets. A dynamic programming algorithm is applied to solve this task. The complexity of the algorithm is O((max-min) · n). Figure 3: The curve is cut into subsets, where the number of points in each subset satisfies the minimum and maximum requirement, and the sum of the pairwise squared deviations within the subsets are minimized. Step 3 The third step finds all overlaps and resolves them while further minimizing the sum of the pairwise squared deviations within the subsets. At present, we are using a brute force search in the third step, which accounts for most of the total execution time (with complexity O(n 2 )), to find overlapping subset pairs. The overlapping is resolved by finding a partitioning between the two subsets, minimizing the sum of the pairwise squared deviations within each subset, while maintaining the required minimum and maximum number of points in each subset. This is performed by projecting points to a set of directions in n-dimension covering the space evenly. For each projection, the optimal division can be found by a dynamic programming approach. Among all projections, the one with the minimum penalty is chosen. The algorithm iterates until no overlaps are detected or the number of iterations reaches the upper bound. The figure below shows a single overlap being removed. Figure 4: Finding and resolving overlaps within the subsets. The new subsetting algorithm classifies each point into a subset (Figure 5) and creates polygons that wrap around each individual subset (Figure 6). Thus, for each polygon, all points inside the polygon belong to the same subset. Figure 5: Non-overlapping subsets. Figure 6: The non-overlapping subsets produces the final tool output which are represented by non-overlapping polygons. Performance analysis of the new algorithm Overall, the time complexity of the whole algorithm is O(n 2 ), given the high computational complexity of searching for overlapped subsets. The figure below shows the time complexity of the three steps as well as each function plotted on the same graph. In the current implementation, the third step takes the vast majority of the computation time for large numbers of points. Figure 7: Time complexity of the algorithm in the three stages. The space complexity of the algorithm is O(n), which means that the required memory is proportional to the number of points. The following image shows the memory allocation is linear with the number of input points. Figure 8: Space complexity of the algorithm. The following graphs show the analysis of the algorithm in the third step (resolution of overlapping subsets). When two subsets overlap, resolving their overlap reduces the total sum of the pairwise squared deviations. The graph on the left shows the decrease in the total sum after each resolution. The graphs on the right shows the number of resolutions in each iteration. For this data, the algorithm converges after about 14 iterations. Figure 9: Analysis of the algorithm in the third step where overlapping subsets are resolved. Application This new subsetting algorithm can work efficiently with clustered datasets with more than a billion points. The quality of the constructed subsets in the new subsetting algorithm is often better than the algorithm currently used in EBK. Though it does not yet support 3D points, the methodology can be easily extended to multiple dimensions. Part of this new subsetting algorithm and blog content was created by Zeren Shui, advised by Alexander Gribov, during his internship with the Geostatistical Analyst Team in summer 2017. Zeren is currently pursuing his Master’s degree in Data Science at College of Science and Engineering, University of Minnesota – Twin Cities. His research interests are Bayesian Statistics, Machine Learning, and Data Mining. For additional questions, you can comment here or contact him at shuix007@umn.edu. References [1] S. D. Lynch, Development of a raster database of annual, monthly and daily rainfall for southern Africa, WRC Report (1156/1/04) (2004) 78.
... View more
01-25-2018
12:47 PM
|
1
|
0
|
1148
|
|
POST
|
I don't want to give specific recommendations of what software and packages to use, but you should do some research into "empirical semivariograms." The algorithm to compute them is not terribly complicated, and I'm sure you'll be able to find it implemented from a reliable source.
... View more
12-15-2017
10:25 AM
|
0
|
1
|
1901
|
|
POST
|
Sorry to say this after you've done so much coding already, but the binned and averaged semivariances shown in the Geostatistical Wizard can only be exported through the user interface of the wizard. They can't be exported with Python. If doing this manually is a possibility, you can add a step to save the geostatistical layer as a layer file using the Save To Layer File geoprocessing tool. You can then add the layers to ArcMap and right-click -> Method Properties. This will open the Geostatistical Wizard for that layer, and you can export the semivariogram table.
... View more
12-15-2017
08:04 AM
|
1
|
3
|
1901
|
|
POST
|
In order for the geostatistical layer to pass through the input points perfectly, you can use IDW or Radial Basis Functions. They are exact interpolators, and they will always pass through the input points. If you are using kriging (other than Empirical Bayesian Kriging), you can force it to be an exact interpolator by turning off the nugget effect. In the Geostatistical Wizard on the semivariogram page, look for the "Model Nugget" section and disable the nugget effect. However, you should know that forcing an interpolation method to be exact can sometimes result in strange artifacts in the surface. Make sure to look at your surface carefully.
... View more
12-12-2017
07:59 AM
|
0
|
0
|
6878
|
|
POST
|
Hi Daniel, Geostatistical layers have very limited symbology options, as they are not really intended for display purposes. They are designed for quick analysis, visualization, and validation. But in a geostatistical workflow, you should nearly always export these layers to raster or feature for actual display purposes. The raster can symbolize at a fine scale (rather than a course grid that is used for geostatistical layers), and it can build its symbology based on its histogram (the geostatistical layer, on the other hand, cannot calculate a histogram for itself). We very strongly recommend to perform symbology with a raster rather than a geostatistical layer. Also, the GA Layer to Contour tool can export your geostatistical layer to a polygon feature class that will look visually identical to the geostatistical layer. This tool has a classification value table where you can easily specify manual breaks. It can also be automated with Python.
... View more
12-07-2017
12:13 PM
|
0
|
0
|
1910
|
| Title | Kudos | Posted |
|---|---|---|
| 2 | 01-16-2025 04:52 AM | |
| 1 | 10-02-2024 06:45 AM | |
| 2 | 08-23-2024 09:18 AM | |
| 1 | 07-19-2024 07:09 AM | |
| 1 | 08-21-2012 09:47 AM |
| Online Status |
Offline
|
| Date Last Visited |
02-25-2026
06:39 PM
|