POST

Hmmm... not sure what could be causing this. When you hover with your cursor over the Red X, what error does it report? Please determine if script tools outside of the Spatial Statistics toolbox have the same issues (the MultiRing Buffer tool, for example). If all script tools are having problems, my guess is that you have a software installation problem. If, by chance, you have access to another computer with the same software, please try to determine if the problem is machine specific. If you only see the problem on one specific computer, this is more evidence that the problem relates to a software installation issue (and the solution would be to reinstall the software ... but Esri Tech Support might have other suggestions). If you are seeing the problem across more than one computer, the problems might be data specific... ? 1) Try those tools with different input data to see if the problem is consistent. If you continue to get the problem with different data sets, we're back to thinking it's a software installation problem. 2) If the tools work with other data sets, however, the problem could be a data corruption issue. Try right clicking on the feature class in the Table of Contents and selecting Export. Export the data to a shapefile. See if the spatial statistics tools work with the new shapefile. So very sorry you're having problem. Lauren Esri Geoprocessing, Spatial Statistics
... View more
12122011
11:52 AM

0

0

23

POST

Great questions! Using Global Moran�??s I to find the peak Z in order to help you decide the appropriate scale of your analysis (fixed distance band) is an excellent strategy. [Just by the way, if you are using ArcGIS 10.0, we have a sample script tool called Incremental Spatial Autocorrelation that automates this process and can save you some time. If you are interested in getting this tool, please go to our resources page, www.esriurl.com/spatialstats , and look for �??Supplementary Spatial Statistics�?�]. The message about some features not having any neighbors is indication that your distances are not large enough to ensure that every feature has at least one neighbor. To get a quick description of neighbor distances, you can run the Calculate Distance Band From Neighbor Count tool (in the Spatial Statistics toolbox, Utilities toolset). It returns the minimum nearest neighbor distance (this is the distance between the two features that are closest together in your dataset), the average nearest neighbor distance (this is the average distance each feature is away from its nearest neighbor), and the maximum nearest neighbor distance (this the smallest distance that will ensure that EVERY feature in your dataset has at least one neighbor). Sometimes when you have a couple outlier features, the distance to ensure every feature has at least one neighbor gets rather large, �?� possibly larger than is effective for your analysis. You can check this by using the measurement tool to �??draw�?� the maximum nearest neighbor distance. When outliers are forcing you to use distances that are too large (too large to effectively capture the spatial processes you believe are at work in your data), we recommend: 1) Select all but the outlier features. 2) Run Incremental Spatial Autocorrelation (or Global Moran�??s I for increasing distances) on the selection set. [If you don�??t enter a distance for the beginning distance, btw, it will use the distance that ensures every feature has at least one neighbor]. This analysis will give you the peak distance for the majority of your features (unbiased by the handful of outliers). You still want to include the outliers in your final analysis, however, so: 3) Run the Generate Spatial Weights Matrix tool, selecting Fixed Distance, and using the distance you found in (2). Also, check ON Row Standardization (more about that below), and put 2 for the Number of Neighbors parameter < the 2 will force each feature (even the outliers) to have at least 2 neighbors and you won�??t get the message about invalid results. With this parameter, the tool will look farther than the distance provided, only if required and only for the outliers, in order to ensure every feature gets at least 2 neighbors. 4) Run your hot spot analysis and/or your Cluster and Outlier Analysis using the .swm file created in (3). You do this by selecting �??Get Weights From File�?� for the Conceptualization of Spatial Relationships parameter and then specifying the path to the .swm file for the Spatial Weights Matrix File parameter. About your concerns with regard to the different sampling densities�?� On the one hand, any bias in your sampling scheme will be reflected in your results�?� what that means: �?� The Hot Spot Analysis (Gi* statistic) works by looking at each feature (each sample) within the context of neighboring features. It compares the local mean of the total positives to the global mean of the total positives and decides if the difference is statistically significant or not. �?� If you were to ONLY sample in the high total positive areas (for whatever reason, as one example of bias), the global mean would be higher than if you had a truly random sample of the entire study area. Because the global mean is higher, you will see fewer hot spots than you might with a truly random sample. On the other hand, if your samples are truly representative of the broader population (what you would see if you could sample all across Canada using a random sampling design), then the fact that you have more samples in some areas and less in others is not so much of a concern. The reason is that the Gi* statistic is conceptually looking at the local mean in relation to the global mean and so the number of features isn�??t so important. The number of features is still considered in determining the zscore, however, but only by the fact that more features means more information. So when a feature has very few neighbors, Gi* still does the very best it can, but it has less information to come up with a result. When a feature has LOTs of features, the Gi* statistic has more information to compute a result. There is not an over count or undercount bias�?� instead, there are just differences in how confident you can be about the results in places with few samples. Does that make sense? If not, please ask and I can try again 🙂 Regarding the extreme Z scores for Global Moran�??s I, 2 things: 1) When you run Global Moran�??s I with increasing distances, or when you run the Incremental Spatial Autocorrelation sample script tool, be sure to check ON for Row Standardization (this doesn�??t make a difference for Hot Spot Analysis, but it is important for the Global tools). If your points were very reflective of the spatial distribution of what you are sampling, you would not check ON for Row Standardization (example: when we have point data reflecting ALL Crimes, then we see lots of points where there are LOTs of crimes and few points where there are few crimes, and that difference in point densities is reflective of crime patterns in our study area). In your case, you have dense samples in some areas, and less dense samples in others and it probably has more to do with where you decided to sample rather than the underlying distribution of the total positives data (I hope that makes sense). To compensate for any bias in your sampling scheme (the idea that some places happened to get lots of samples and others happened to get very few samples), check ON for Row Standardization. 2) If the Total Positives data is skewed (my guess is that it may be; you can check this by creating a histogram of the Total Positives data values�?� if it deviates from a bell curve, your data is skewed), you want to make sure that on average each feature has 8ish or more neighbors. When you use the Generate Spatial Weights Matrix tool to create a file that represents the Conceptualization of Spatial Relationships among your features (as recommended above), you automatically get a summary of the number of neighbors. [For ArcGIS 10.0: You can access this information from the Results window�?� or you will automatically see this information if you disable Background processing. If you need additional information about this, please don�??t hesitate to ask :)]. Because you put 2 for the Number of Neighbors parameter, the minimum number of neighbors a feature will have is going to be 2. You want the average to be 8 or more (but more than like 100 is starting to get silly). The Gi* statistic is asymptotically normal: as long as you ensure every feature has at least a few neighbors and none of the features have everyone as a neighbor, you can trust your zscore results. With skewed data and excessively too few (zero neighbors) or too many neighbors (everyone is a neighbor), the skewness in the data analyzed spills over into the zscore results and you have less confidence in those values. Okay, lots of information. I hope this is helpful! Best wishes, Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
12052011
02:21 PM

0

0

6

POST

Hi Mareike, Good questions! First, I would recommend the following steps: 1) overlay your island with the fishnet grid. 2) do the spatial join 3) remove any grid cells that fall off the island, or where it would be impossible to have points. You remove these becasue the Gi* statistic conceptually compares the local mean to the global mean, then decides if the difference is significant... when you have cells that fall outside the study area (so that lots of cells have zero values), it brings the global mean down ... With a "sea of zeros" in your dataset, you tend to see anything that is nonzero appearing as a hot spot. 4) If you have a good distance (scale of analysis) for the Distance Band or Threshold Distance parameter in mind (based on your knowledge of what you are studying), great!!! If not, then if you are using ArcGIS 10.0 you will find the Incremental Spatial Autocorrelation sample script ( www.esriurl.com/spatialstats ... find "Supplementary Spatial Statistics" for the download) helpful. If you don't have ArcGIS 10.0, you can get the same results by running the Spatial Autocorrelation tool for increasing distances and looking for a peak value (I can give you more information about that if you need me to). 5) Run Hot Spot Analysis on your remaining fishnet grid cells, using the distance you discovered in (4). Next, to answer your questions about the fishnet grid area (how much of the cell falls into the ocean vs how much falls on the island... I think that's what you are asking): for any of the Distance Conceptualizations (Fixed Distance, for example), the Hot Spot Analysis tool "sees" and treats polygons as point centroids. Does this help? Please let me know if I didn't answer your question and I am happy to try again 🙂 Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
12052011
10:17 AM

0

0

69

POST

Hi Thom, Good questions 🙂 Let's think about this with points first... two examples involving variations in my point densities: 1) Suppose we have ALL crime incidents. In some parts of our study area there are lots of points because those are places with lots of crime. In other parts, there are few points, because those are low crime areas. The density of the points is a very good reflection (is representative) of what I'm trying to understand: crime spatial patterns. 2) Suppose I've taken soil samples. For some reason (the weather was nice or I happened to be in a location where I didn't have to climb fences, swim through swamps, hike to the top of a mountain, etc.), I have lots of samples in some parts of my study area, but fewer in others. In other words, the density of my points is not strictly the result of a carefully planned random sample; some of my own biases may have been introduced. Further, where I have more points is not necessarily a reflection of the underlying spatial distribution of the data I'm analyzing. For case 1, whether a feature has more neighbors or not is a reflection of actual crime densities. While it is fine to row standardize, in this case I'd rather have the density of my points play a role in my analysis, because they are reflective of what I'm studying. For case 2, I want to minimize any bias that may have been introduced during sampling. When you row standardize, the fact that one feature has 2 neighbors and another has 18 doesn't have a big impact on the results; all the weights sum to 1. Make sense? Okay, polygons... Whenever we aggregate our data we are imposing a structure on it. If that structure is a good reflection of the data I'm studying, I might decide not to row standardize ... but to be honest, I can't think of a good example off the top of my head where that would be the case. Some might argue that census polygons (like tracts) are designed around population, so if the data I'm analyzing has to do with people, I might not want to row standardize... but the way tracts appear in the census represent just one of many, many, many ways they could have been drawn. So with polygon data, I always apply row standardization. I hope this helps! Lauren Lauren M. Scott, PhD Esri Geoprocessing and Analysis, Spatial Statistics
... View more
11222011
11:59 AM

0

0

11

POST

Hi Ellen, I will look at the data you sent to me. Thank you. With regard to the Expected K values being exactly equal to the Distance values, that is what you will always get. The reason is because we are using a transformation that converts the Expected K value to be equal to distance. For more information on this, please see: Getis, A. Interactive Modeling Using SecondOrder Analysis. Environment and Planning A, 16: 173�??183. 1984. The actual formula for the L(d) transformation is given in: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/How_Multi_Distance_Spatial_Cluster_Analysis_Ripley_s_K_function_works/005p0000000s000000/ I do have a bug in for myself to improve the K Function documentation. Sorry for the confusion! (I can't believe I include the L(d) formula and then don't actually tell you what it does... my very bad! So sorry!). Thank you for your post and for sending me the data. More soon. Lauren Lauren M. Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
10262011
09:28 AM

0

0

44

POST

Hi Mareike, Sorry you�??re having trouble with this! The K Function works by simply counting feature pairs: the tool �??visits�?� each feature in the dataset, selects all features within a specified distance of the target feature, and counts the number of feature pairs among the selected features�?�feature pair counts are accumulated as the tool visits every feature. The distance is then increased and the counting repeated�?� and this process continues however many times you�??ve specified for the Number of Distance Bands parameter. These accumulated counts (one for each distance) are converted to an index and plotted on a line graph. When your points tend to be clustered, the accumulated counts are higher, and the index falls above the blue diagonal expected line. When the points tend to be dispersed, counts are lower and the index falls below the expected line. To decide if the clustering or dispersion is significantly different from what you would get if the points were randomly distributed in your study area, the tool uses simulation. The tool randomly pitches your points into your study area 9, 99, or 999 times and for each simulation, it performs the whole distance/counting thing. From all the simulations, it remembers (for each distance) the most clustered index obtained from the random process of pitching your points into the study area, and it remembers the most dispersed index obtained. These extreme values form the confidence envelope, and they show you (given X number of points and the peculiarities of your study area), what is the range of possible indices you can obtain from a random process. For a weighted K function, the confidence envelope follows the observed line and the simulation process is a bit different than I described above. From the graphic you sent, my guess is you are using the unweighted K Function, but please let me know if I�??ve guessed incorrectly. For the unweighed K function, if the study area has a very simple shape (circle, rectangle) the confidence envelope will enclose the expected line. When the study area isn�??t simple (there are peninsulas, or you are working with an �??L�?� shape, for example) then the study area itself can force randomly placed features to be far away from each other, so the confidence envelope appears below the expected line (more dispersed). Okay, so why might someone run the unweighted K function? The K function provides a kind of spatial �??fingerprint�?� of how spatial clustering among your point features changes across multiple scales (across increasing distances). Why is this interesting? Whenever we see clustering in the landscape, we are seeing evidence of underlying spatial processes at work. Statistically significant peaks or dips of the observed index are evidence that spatial processes are operating at the associated spatial scale. Sometimes knowing something about these statistically significant spatial scales provides clues about the underlying processes at work. Comparing the spatial �??fingerprints�?� for two different point datasets within the exact same study area can tell you if their spatial patterns are being influenced by the same or different spatial processes. Some questions for you: You indicated you are analyzing birds on an island. Are you providing a study area polygon when you run the K function? If so, might that polygon be forcing a structure on the simulations that would explain why the confidence envelope falls below the expected line? Do the points you have reflect a sample of bird sitings, or do they represent ALL possible data (like ALL bird nests on the island)? Sampled data, especially when the samples might be biased by observer behavior or the sampling scheme, are not good candidates for the K Function�?� there is the risk that you will model observer behavior rather than bird behavior. You mentioned a projection/transformation warning message or error�?� that sounds like a problem. If possible, I�??m hoping you can send me your data so that we can figure out exactly why you are getting the unexpected results. Please contact me directly at LScott@Esri.com if that might be possible. Again, I�??m sorry you are having problems with the K Function. I hope this information is helpful to you. If anything is unclear, please contact me or reply here and I will do my very best to clarify. Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
10212011
04:00 PM

0

0

44

POST

Hi Zahi, If you are using ArcGIS 10.0, you should be able to see the numerical output by: 1) Disabling background processing (click the Geoprocessing Menu, then Geoprocessing Options... UNcheck "Enable" for background processing). OR 2) Open the Results window (Geoprocessing Menu, then Results). You will see an entry for your model... open that, right click on Messages and select View. The "*" to determine statistical significance isn't written to the coefficient or diagnostic tables (as you noticed), but you can easily interpret pvalue significance as follows: * P < 0.10 means statistically significant at the 90% confidence level (less conservative) * P < 0.05 means statistically significant at the 95% confidence level * P < 0.01 means statistically significant at the 99% confidence level (more conservative). To learn more about interpreting zscores and pvalues, please see: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/What_is_a_z_score_What_is_a_p_value/005p00000006000000/ The VIF values don't reflect results that are either significant or not significant... rather the rule of thumb is that if a VIF value is larger than about 7.5 there are issues with variable redundancy (multicollinearity) that could potentially lead to model instability. For more information about interpreting OLS diagnostics, please see: http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#/Interpreting_OLS_results/005p00000030000000/ With regard to removing dummy variables when you move from OLS to GWR: this strong recommendation applies to spatial regime dummy variables where you have a bunch of 1's spatially clustered and/or a bunch of 0's spatially clustered. The reason this is a problem for GWR is what's called "local multicollinearity". GWR creates a separate equation for each feature and calibrates it (i.e., computes the coefficients) using nearby features (rather than using ALL features). When the values for a variable cluster spatially (e.g., all 1's) there is the potential that the nearby features used for calibrating an equation will have all the same values and this would result in perfect multicollinearity with the Intercept ...and GWR cannot solve (calibrate) in that situation. Even if there isn't perfect local multicollinearity, when there is very little variation in a variable's values, results can be unstable. You can tell if you are having this problem by looking at your output feature class from GWR. When the condition number for a feature is larger than about 30, there are issues with local multicollinearity and you have less confidence in the results associated with those features. I hope this is clear... if not, please let me know and I can try again 🙂 You also asked about the validity of using GWR for linear data. This is fine as long as you recognize that network relationships are not used to define which features are "nearby" or how much weighting a nearby feature has with regard to calibration. Remember, calibration of the equation associated with a particular feature is based on features that are "nearby"... the idea is that nearby features better reflect relationships between your dependent variable and the explanatory variables than features that are far away. "Nearby" is a function of your answers for the Kernel Type and Bandwidth Method parameters and we can talk more about that if you want, but the point is that distances to determine which features are nearby are computed using plain ol' Euclidean straightline, asthecrowflies distance. With a road network, we might expect two points that are on the same street to be more alike (to deserve a larger weight/influence) than two points that are the same distance but on parallel streets... these type of network relationships won't be considered. So if you use GWR, you should ask yourself if including network relationships in the calibrations for each feature equation is important to the question you want to answer. For other tools in the Spatial Statistics toolbox you can create spatial relationships base on a road network (Generate Network Spatial Weights tool), but unfortunately our GWR tool currently cannot take advantage of this option. I hope this answers your questions! Best wishes, Lauren Lauren M. Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
06212011
03:06 PM

0

0

6

POST

Hi Bilal, You can add a new field to your table. Right click on that field and select Field Calculator. You can use the calculator to populate the new field with the log of the original values or the original values raised to whatever exponent you think works best. The Add Field and Calculate Field tools can also be used. I hope this helps, Lauren Scott Esri
... View more
05112011
03:30 PM

0

0

24

POST

Hi Mike, Yes, it's Standardized Residual. Thanks for pointing out that this isn't documented! We will get this corrected. Best wishes, Lauren Scott Esri Geoprocessing, Spatial Statistics
... View more
05112011
03:24 PM

0

0

24

POST

Hi Carrie, A great place to start is our resources page: www.esriurl.com/spatialstats You'll find short videos, a free 1 hour training seminar on spatial pattern analysis, and a couple hot spot analysis tutorials (in fact, one of the tutorials actually uses 911 emergency call data). Best wishes! Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics
... View more
04272011
09:49 AM

0

0

1

Online Status 
Offline

Date Last Visited 
11112020
02:23 AM
