Latest Contributions by EricKrause

‎05-23-2024

This has been included in the product plan for ArcGIS Pro 3.4. The reimplementation will be a geoprocessing tool that creates a customized scatter plot chart on a feature layer that displays the projected scatterplot and trend line of the XZ plane. In ArcGIS Pro 3.3, you can create this scatter plot manually with customized Arcade code using these steps: On a feature layer, create a scatter plot chart by right-clicking the layer -> Create Chart -> Scatter Plot. In the Chart Properties pane, for the "Y-axis Number", provide the analysis field. In the "X-axis Number", click the "Set an expression" button to the right of the pulldown menu. Paste the Arcade code at the end of this post into the "Expression" code block (make sure that the "Language" at the top is set to "Arcade"). Change the second line of the code to any desired direction. The direction is provided as degrees clockwise from North. For example, 0 is north, 90 is east, 180 is south, and 270 is west. Click OK. The directional trend scatter plot will be displayed in the Chart pane. You can click "Set an expression" button again and change the direction, and the scatter plot will update to show the trend in the new direction. To show the polynomial trend line in the scatter plot, check the "Show trend line" checkbox in the Chart Properties pane, choose "Polynomial" from the dropdown, and provide a desired "Trend Order". // Input direction as clockwise degrees from north var angleFromNorth = 0; // Convert direction to counterclockwise radians from east var adjustedAngleDegrees = 90 - angleFromNorth; var adjustedAngleDegrees = adjustedAngleDegrees%360; var angleInRadians = adjustedAngleDegrees * PI / 180; // Return x-coordinate of rotated coordinate system return Centroid($feature).X * Cos(angleInRadians) + Centroid($feature).Y * Sin(angleInRadians)

‎04-30-2024

Hi @NakkyEkeanyanwu, I think the major confusion is that Dimension Reduction is not selecting a subset of the variables that you provide. Instead, it uses all variables to construct new "components" and each component is a weighted sum of all the variables. As a very simple example, let's say you have four variables (A, B, C, and D) and you want to create one component (reducing the dimension from four to one), the component might looks something like this (I am making up these coefficients): Component = 0.7*A + 0.2*B + 0.6*C - 0.1*D In essence, the component uses all variables, and the weights (the coefficients) indicate how "important" that particular variable is in the component. These coefficients are the eigenvector of the component, and the associated eigenvalue indicates how much of the total variability of the four variables is captured in the component. Frequently, a large percent of the total variability of all variables can be captured in just a few components, and this is what drives things like the Broken stick and Bartlett's test methods. They try to find a compromise between minimizing the number of components and maximizing the amount of variability that is captured by the components. Determining how many components to create is the most difficult part of Principal Component Analysis, so various methodologies are performed to help you decide. In an ideal case, you see some components account for a large percent of variance (PCTVAR field), then a sudden drop in the percent. However, for your data, I don't really see this; the variability captured by each component seems to drop steadily, and I think this is why Bartlett's method is recommending using a large number of components. However, using 7 components certainly seems justifiable here as well. Really, you could justify any number between 3 and 28. Regarding only 28 components explaining 100% of the variance, this means that two of the variables you provided are redundant, that their information is fully accounted for by other variables. If I'm reading your screenshots correctly, you use total population as a variable, and you also use the populations of particular subgroups. If the populations of the subgroups add up to the total population (or very close to it), then there is redundancy since the total population is captured by the sum of the populations of the subgroups. I suspect this is happening for two variables, resulting in 28 components that account for all variability. Please let me know if you have any other questions.

‎04-10-2024

Thank you for the recommendation. We will add this to the documentation. The reason the tool does not refer to them as "fixed" and "adaptive" within the tool is that these are both general paradigms rather than specific neighborhood types. Using a number of neighbors is just one kind of adaptive neighborhood, and a fixed distance is one kind of fixed neighborhood. If it just said "Adaptive" in the tool, you would then need to ask why kind of adaptive neighborhood it is, and it is specifically a number of neighbors neighborhood. Similarly with fixed distance bands.

‎04-10-2024

GWR is a relatively recent tool (there is also an older version that is now deprecated), so it creates a Source ID field on the output features rather than require and input Unique ID field.

‎04-10-2024

Hi @geolane93_KU, Without seeing the data and having a better understanding of the purpose, it's difficult to give concrete recommendations. However, I do have a few thoughts that might help. First, if you have ArcGIS Pro 3.0 or later, look into the Compare Geostatistical Layers tool. You can create various different EBK3D outputs and compare their cross validation statistics to see which are more accurate than others. Then can help choosing a subset size, transformations, and semivariogram models. Second, a subset size of 20 sounds quite small to me, particular for the K-Bessel semivariogram. My experience is that you should use at least 50 points in each subset for a semivariogram model with so many parameters (and, usually, more than 100 is better). Third, I would consider removing some of the surface points that may be playing too dominant of a role in the model. The problem is alleviated somewhat by using sectored neighborhoods, but the comparatively dense sampling at the surface is likely still negatively impacting subsurface predictions. In particular, I suspect that the estimated Elevation Inflation Factor (EIF) is being most affected here, and the EIF is an extremely important parameter for accurate results. Fourth, if the jagged edges and artifacts are far away from the input points (like in the top or bottom corner of the 3D extent), then I would not worry too much about them. EBK (2D and 3D) often produces these kinds of artifacts when you extrapolate (predicting outside the input points), but it tends to be very stable when interpolating (predicting between the input points).

‎04-02-2024

Hi @JamalNUMAN, "Number of Neighbors" is an adaptive bandwidth because the distance used at a location depends on the distance to the last neighbor, so it will vary ("adapt") depending on the location. I believe you are looking at the documentation for an older and deprecated version of GWR. Please find the documentation for the new version here: https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/geographicallyweightedregression.htm

‎04-01-2024

Hi @JamalNUMAN, Requiring a Unique ID field is an older design pattern that is not used in more recent tools. In fact, the Generalized Linear Regression tool (with Gaussian model type) does the same thing as the OLS tool, and it does not require a Unique ID field. The idea behind the Unique ID field is that it gets copied to the output features, so you can join the output results back to the input (or vice versa). For example, if you have a selection, the output features will not have the same Object IDs as the input, so some other field needs to be used to match input/output. In more recent tools, each Object ID from the input is copied to a "Source ID" field of the output features. This serves the same purpose (being able to match output to input) but does not require that you provide a field.

‎02-14-2024

Hi @DOEEYANG, Can you clarify how you are performing kriging? I'm guessing the Kriging tool in the Spatial Analyst toolbox, but there are a few different versions. Without looking at your data, my guess is that these areas with no predictions are outside the neighborhood of your input points. Assuming you're using the tool above, check the "Search radius" parameter. If you are using a "Variable" neighborhood, check whether there is a "Maximum distance" value. If using a "Fixed" neighborhood, check the "Distance" value. If your cells with no predictions are further than this distance from any input point, the value cannot be interpolated. Using a sufficiently large distance should allow you to interpolate everywhere in your study area. Please let me know if this does not resolve the problem or if you're using any of the kriging methods in Geostatistical Analyst.

‎02-14-2024

GWR will not use the z-coordinate in any capacity. So if you have multiple points at the same (x, y) but different z, GWR will treat them as being at the same location. Splitting your dataset by floor and independently performing GWR is the only solution that immediately comes to mind. The problem of constant values of the explanatory/dependent variable is more difficult, as GWR will return an error if any neighborhood contains a constant value of any explanatory variable or the dependent variable. To calculate GWR results, you'll need to use neighborhoods large enough to ensure this never happens. However, if the neighborhoods are very large, GWR effectively turns into OLS. Hopefully there is some range of neighborhood that can estimate local effects but still never encounter neighborhoods with constant values.

‎02-14-2024

Hi @JamalNUMAN, I don't think it does any data splitting for the statistics in your images. Data splitting is not required in order to compute them, and in my experience, OLS, GWR, and other variants of the general linear model do not perform data exclusion to calculate them. In recent years, I've seen GWR used with data splitting (to make it more in line with machine learning workflows), but I do not think the GWR tool does this. Also, I'd suggest that you ask your GWR questions (and any other questions about the Spatial Statistics toolbox) in the Spatial Statistics Place. I know a lot about GWR as a theory, but I'm less knowledgeable about the specifics of the implementation of the GWR tool. For example, I do not know why those three statistics are calculated, but others (like MAPE) are not.

‎02-11-2024

Moved: https://community.esri.com/t5/arcgis-pro-questions/arcgis-pro-3-0-2-the-train-using-automl-tool-with/m-p/1380869#U1380869

‎02-08-2024

Hi @JamalNUMAN, While the error only talks about correlations between explanatory variables (which obviously will not be a problem for a single explanatory variable), a couple other things can also cause this error. GWR builds regression models using neighborhoods around each feature, and if any of these neighborhoods have a constant value for the dependent variable or any of the explanatory variables, you will also encounter this error. You should trying using different neighborhood settings (generally using larger neighborhoods), or attempt to locate the areas of constant value. The Neighborhood Summary Statistics tool can be used to find local standard deviations, which can help you identify areas with constant values of the variables. I hope this helps, and please let me know if you have any other questions.

‎02-01-2024

Hi @Jill_Clogston, The message from the tool is an informational warning rather than an error. It does not mean that your analysis is invalid or that there is a problem with your data. CF Conventions are a set of standards for how to store and label data in a netCDF (NC) file. NC files are generic data containers and do not have to abide by these standards; however, some non-Esri software will only work correctly with CF-compliant netCDF files. If you intend to perform your analysis entirely within ArcGIS, this is not a problem, and you can ignore the warning. While I do not know which projection you are using, the warning indicates that it is not one that is part of the CF Conventions. You can likely resolve the warning (which, again, may not be required at all) by projecting your original points to a more common coordinate system.

‎12-01-2023

I should have been more clear about this, but the GWR model as a whole does not have a condition number. However, every local regression has one. It could be the case that some locations have large condition numbers (meaning that the coefficients in that area are unstable and unreliable) but have low condition numbers in another area, meaning that the coefficients are more reliable and precise. I'm also not completely clear what you mean by rerunning GWR multiple times. If you rerun it with the same data, you should get the same coefficients each time. The condition number is more related to whether you should trust the values of the coefficients.

‎11-30-2023

I've heard variations of that phrasing various times, and I don't think it's wrong, but I'd argue there are better ways to conceptualize the condition number. It's more about the stability of the estimated coefficients for a given set of explanatory variable values. The coefficient are estimated by inverting a matrix of data values, and the condition number measures how sensitive the coefficients are to small changes in the data values. For low condition numbers, you can alter/remove some of the data, and the coefficients should not drastically change (in other words, the estimated coefficients are stable). But for matrices with very large condition numbers, even small changes to the data values can wildly change the estimated coefficients (meaning that the estimated coefficients are not stable/reliable). This is a bit easier to understand using simple numbers rather than matrices. Inverting a matrix with a large condition number is equivalent to finding the inverse of a number that is very close to 0. For example, the inverse of 0.001 is 1,000, and the inverse of 0.0001 is 10,000. Even though 0.001 and 0.0001 are very close in absolute value (they're both close to 0), their inverses are very different in absolute value (1000 vs 10000). To put it another way, for values very close to 0, the inverse is very sensitive to small changes of the number. This stability of the inverse is what condition numbers measure for matrices rather than single numbers. I hope that helps, and let me know if any of that was not clear. There are also many resources available to learn about condition numbers, as they are usually taught in Linear Algebra courses rather than geography or statistics.

Online Status	Offline
Date Last Visited	Wednesday

My Ideas

Latest Contributions by EricKrause

Re: Add Trend Analysis Tool in ArcGIS Pro - Status changed to: In Product Plan

Re: Dimension Reduction Tool - First Time Using

Re: ArcGIS Pro 3.0.2: Is setting the “neighborhood type” to be “number of neighbors” considered fixed or adaptive bandwidth?

Re: ArcGIS Pro 3.0.2: What is the function of the “unique ID field” while working with OLS?

Re: Choosing "Correct" EBK 3D Parameters

Re: ArcGIS Pro 3.0.2: Is setting the “neighborhood type” to be “number of neighbors” considered fixed or adaptive bandwidth?

Re: ArcGIS Pro 3.0.2: What is the function of the “unique ID field” while working with OLS?

Re: Non-interpolated area by Kriging Interpolation

Re: ArcGIS Pro 3.0.2: The GWR tool encounters a "coincident features" error due to exceeding the "minimum number of neighbors (30).&quo

Re: ArcGIS Pro 3.0.2: Which data splitting method is used in the GWR when calculating the statistical metrics for performance evaluation?

ArcGIS Pro 3.0.2: The “train using autoML” tool with XGBoost algorithm ends up with “shapeType” error,

Re: ArcGIS Pro 3.0.2: The GRW generates “multicollinearity” error despite the fact that only one explanatory variable is used,

Re: Warning 110067: Your spatial reference not compatible with CF Conventions

Re: How to interpret condition numbers from GWR/MGWR analysis?

Re: How to interpret condition numbers from GWR/MGWR analysis?

Re: Why exporting EBK 3D to voxel layer change org...

Re: Access geostatistical layer created using ArcP...

Re: Why exporting EBK 3D to voxel layer change org...

Re: K-Bessel Semivariogram Equation

Re: Kriging Model Types

R-ArcGIS