Latest Contributions by LaurenScott

‎02-27-2015

Hi Cheryl, Hmmm... small sample size, no obvious clustering when the data is mapped... There may be better solutions, but here is one idea: if you have samples from at least 30 households in a village and have results (positive or negative) for all occupants within each household surveyed, you can calculate a ratio for each household: number of positive cases (positive only) divided by total number of cases (positive + negative) to get the percent positive in each household. If Euclidean Distance (as the crow flies) is a reasonable way to think about the relationships among the households in the village (not reasonable if there are barriers like rivers), you can run Optimized Hot Spot Analysis on the household points using the ratio as your analysis field and see if there is any clustering (that tool only requires 30 points if you have an Analysis Field). If a network, however, is a better representation of the relationships among households (and this could also work if there are bridges from one side of the river to the other connecting households in a village), and you have data for the village transportation network, you can use the Generate Network Spatial Weights tool to create a network representation of the spatial relationships among the households. You would then use the Hot Spot Analysis tool (rather than Optimized Hot Spot Analysis) and the network spatial weights file (set the Conceptualization of Spatial Relationships parameter to: Get Weights From File) to test for clustering. I will also ask a colleague if he has other ideas or suggestions, and I will get back to you if he does (or ask him to please reply direc Very best wishes, Cheryl! Lauren Scott Esri

‎02-09-2015

Hi Cheryl, If my study area wasn't well defined, I probably wouldn't use Average Nearest Neighbor unless I just wanted to casually compare the average nearest neighbor distances (and wasn't interested in determining statistical significance). And that may be exactly what you want to do! I say this because, unfortunately, the Average Nearest Neighbor statistic z-score and p-value calculation is very sensitive to study area size. Average Nearest Neighbor is a global statistic that tells you about overall clustering. Getis-Ord Gi* is a local statistic that shows you where clusters are. It would not be unusual for a global statistic to say there is no clustering but for a local statistic to find local statistically significant clusters. So even if a global statistic tells you there is no clustering, that doesn't mean you shouldn't bother with a local statistic. But I'm not sure Getis-Ord Gi* will actually be very helpful to you. You indicate the positive cases are very rare... if you map the positive cases, can you tell just looking at the map if there is clustering? I say that because for Getis-Ord Gi*, you really want a good range of values... if almost all of your households have zero cases and a couple have one or two cases, that really isn't enough variation in the analysis value to be appropriate for this tool. If there are at least 30 neighborhoods in the village, aggregating the counts and creating ratios (positive to negative) for each of those neighborhoods might provide enough variation ?? Another issue that will be problematic, though, is appropriately modeling spatial interaction among the households. You will want a model that reflects that there is no (or less) interaction across the river than on the same side of the river. Finally, keep in mind what this analysis is saying... the expectation (null hypothesis) is that every positive case could be dropped down hither tither onto the households in a random manner. Consequently, finding statistically significant hot or cold spots suggests other processes (beyond random chance) may be at work... Still, all we can really do in this case is reject that null hypothesis. Would rejecting the null hypothesis be useful (would someone get excited to learn that who gets the disease is not completely random)? If we believe getting the disease is entirely the result of bad luck (no contagious property, no spreading vector, no genetic components) at all, then yes, finding statistically significant clustering might be interesting. Similarly, if you have no idea what the factors promoting the disease are, then WHERE the clusters occur might show you where to start looking for answers. But keep in mind that Getis-Ord works by comparing the local mean (average positive cases for a household and its neighbors) to the global mean (average cases for all households). The tool then determines if the local mean is significantly different from the global mean. If you have a sea of households with zeros, any deviation from zero will be statistically significant and just mapping the positive cases will probably show you where to look for factors that may be promoting the disease. I guess this is what I would do: 1) If mapping the positive cases shows clear clustering, I would go with my map. Done. 2) If I had a good range of values for the household ratio of people with and without the disease, and if I had more than 30 households on each side of the river with positive cases, and if it was tricky to see if the positive cases were clustered or not... I would run hot spot analysis on the ratios for each side of the river separately using the exact same distance band for both analyses... Then I could make comments about the clustering overall and also could compare the clustering on each side of the river. I hope this helps a little bit! Very best wishes, Lauren Scott Esri

‎02-09-2015

Hi Andrew, You can definitely have some fun analyzing your crime data You will have to convert the categorical data to counts, or proportions, but you can still learn a lot about how the different types of crimes relate to each other. Here are some ideas: 1) Use Optimized Hot Spot Analysis. It will overlay your study area with a fishnet grid and count the number of crimes (of all types) that fall within each grid cell, then it will perform hot spot analysis to show you the hot and cold spot areas. This answers the question: where are the hot and cold spots of (all) crime? However, you can also run Optimized Hot Spot Analysis on different types of crime, then visually compare the hot spot maps. 2) You can use the fishnet grid cells from (1) or else use census tracts and count the number of unique crime types within each polygon to see where you have the highest crime diversity (you might find that some places only have robberies, for example, while other places experience a wide variety of different crime types)... you can run hot spot analysis on the diversity counts. 3) For fishnet grid cells (output from Optimized Hot Spot Analysis, then use Spatial Join) or census tracts, count the number of assaults, the number of robberies, the number of auto thefts, etc. within each polygon, then convert those counts to a percentage of all crime. You can then run grouping analysis to find polygons with similar challenges ... one group might be high for assault and narcotics, for example, but low for robbery... knowing the profiles -- the specific challenges -- of each group can help you identify effective prevention strategies. 4) If you have distinct clusters of particular crimes, you can create standard deviational ellipses around each cluster then overlay the ellipses for two different types of crimes to see how spatially integrated they are. (I'm doing an analysis right now that looks at violent crime in relation to alcohol establishments... to see how integrated those two "activity" spaces are). 5) Using the output from (1) you can find spatial outliers for all crime or for specific crime types: a high crime count area surrounded by low crime count areas, or a low crime count area surrounded by high crime count areas. These anomalies are often very interesting (what is that one neighborhood doing right... it is has no problem at all with narcotics while surrounding areas are high for drug related crimes? ... or why is this one neighborhood so high in relation to surrounding neighborhoods?) 6) Be sure to check out the new space time pattern mining tools in the 10.3 release as well: An overview of the Space Time Pattern Mining toolbox—ArcGIS Help | ArcGIS for Professionals I'm sure others will have ideas as well. I hope this is helpful! Best wishes, Lauren Scott Esri

‎08-25-2014

Hi Alex, Sorry you are having problems with the tool! What version of the software are you using? (ArcGIS 10.2 or ?). You should not see that error unless you truly have all the same values so that's odd. Is there anyway that you can send me your data so that I can try to reproduce the problem (you can remove all fields except the one that is giving problems)? If so, please email me directly at LScott@esri.com. Again, sorry this isn't working for you! Lauren

‎08-19-2014

Please see if this works for you: ArcGIS Help Space Time Cluster Analysis)‌ Best wishes, Lauren

‎08-14-2014

Our new tools for 10.3 are Create Space Time Cube and Emerging Hot Spot Analysis... they are beta in ArcGIS Pro and we are going to do our very, very best to get them into the 10.3 ArcMap release as well. For future releases we are planning to develop additional tools (probably Outlier Analysis next) to work on the cube (netCDF) data structure. Note: the additional tools work is not yet started, so it technically does not fall into the "concrete plans" category... but that's what we are thinking. Also definitely not yet in the "concrete plans" category: we are very interested in predictive event analysis and are starting to investigate. Thanks for your interest in our work! Lauren

‎08-14-2014

Great to know that the sample script is working on 10.2.2! It looks like your zip file only has the toolbox though (instead of both the script and the toolbox). If Ori is using your zip file, that is definitely the reason he is getting an error about not finding the script. If Ori is using the zip file I attached, then Phillip's suggestion to make sure the script path is correct is a good one! Here's how: once you navigate to the Temporal Collect Events toolbox, right click on the tool and select "Properties"... then on the "Source" tab, check to make sure the path to the temporalcollectevent.py script file is correct... if it isn't, browse to the .py file to set the correct path. Ori: if you still have problems, please send the screen shot to my email, LScott@esri.com, and I will see if I can figure out what's what. Thanks! Lauren

‎08-13-2014

Just curious... what version of Desktop ArcGIS are you using? We will be slammed until the 10.3 release is done, but if we can, we'll try to get the temporal collect events sample script working for 10.2.2. Thanks! Lauren

‎08-13-2014

Well I am still getting familiar with GeoNet too, but this is what I just tried (and it worked for me): 1) Click on the attachment 2) A pop up is displayed and one of the options is to save. 3) You should see a little blue down arrow when you say okay. If you click on it, you can get the zip file. If that doesn't work for you I will find someone who can tell us a better solution. Here is some more information about the Emerging Hot Spot Analysis categories: *************** Category Definitions ******************** Hot spots that are statistically significant for the last time step interval: New: only the most recent time step interval is hot Persistent: at least 90% of the time step intervals are hot, with no trend up or down Intensifying: at least 90% of the time step intervals are hot, and becoming hotter over time Diminishing: at least 90% of the time step intervals are hot, and becoming less hot over time Consecutive: an uninterrupted run of hot time step intervals, comprised of less than 90% of all intervals Sporadic: some of the time step intervals are hot Oscillating: some of the time step intervals are hot, some are cold Hot spots that are not statistically significant for the last time step interval: Historic: at least 90% of the time step intervals are hot, but the most recent time step interval is not Cold spots that are statistically significant for the last time step interval: New: only the most recent time step interval is cold Persistent: at least 90% of the time step intervals are cold, with no trend up or down Intensifying: at least 90% of the time step intervals are cold, and becoming colder over time Diminishing: at least 90% of the time step intervals are cold, and becoming less cold over time Consecutive: an uninterrupted run of cold time step intervals, comprised of less than 90% of all intervals Sporadic: some of the time step intervals are cold Oscillating: some of the time step intervals are cold, some are hot Cold spots that are not statistically significant for the last time step interval: Historic:at least 90% of the time step intervals are cold, but the most recent time step interval is not We won't be able to get it done for the first release, unfortunately, but in a future release you will be able to select the categories you are interested in and also modify how each category is defined (i.e., right now a persistent hot spot is one where 90% of the time step intervals are statistically significant hot spots and there are no statistically significant cold spots... you might want to change it to 80%, for example). Beta 5 will have better cell size and time interval defaults when you don't provide anything for those parameters, and messages defining categories. I hope this helps. Best wishes, Lauren

‎08-12-2014

Hi Philip, I'm attaching a zip file with the sample tool... it is pretty rough and has only been tested with 10.1 sp1. This question comes up often, so please allow me to answer this question more broadly here: You want to run a space time hot spot analysis on your event data (crime, disease, traffic accidents) where you don't have an attribute/field to analyze. In other words you just want to know where in space and time you have a statistically large number of events. Some people have guessed that they need to use the Collect Events tool. Unfortunately when Collect Events combines points that are near in space, you lose the temporal component of your data (it will combine two points that are near each other even if they have date stamps that are very far apart). What we need here is a tool that aggregates based on space AND time. You want to be able to set a distance threshold (500 meters, for example) and a time threshold (something like 5 days) and have the tool aggregate only those points that are within both those thresholds. The attached sample script will do that. Alternative approaches if the attached script doesn't work for you: 1) Create a model tool that iterates through time and selects only those features/events that meet your time requirement... run collect events on the selection set... use the Add Field tool to give the result a DATE field and Calc the value to be a date within the time period selected (like if you want to combine events within two days, you would want to calc the new date field to either the first or second day date for each record output from Collect Events). Then merge all the results into a single file... then create the spatial weights matrix for the merged data using Generate Spatial Weight Matrix, then run hot spot analysis. 2) If you know python, you can try to debug the attached to make it work for whatever version of ArcGIS you are using. 3) You can sign up for the ArcGIS 10.3 beta program and use the space-time pattern mining tool in ArcGIS 10.3 ?? The new Emerging Hot Spot Analysis tool allows you to run space-time hot spot analysis on event data. I hope this helps! Best wishes, Lauren

‎05-29-2014

At present the z-scores are computed using the mathematics that we�??ve documented (http://resources.arcgis.com/en/help/main/10.2/index.html#/How_Hot_Spot_Analysis_Getis_Ord_Gi_works/005p00000011000000/, http://resources.arcgis.com/en/help/main/10.2/index.html#/How_Cluster_and_Outlier_Analysis_Anselin_Local_Moran_s_I_works/005p00000012000000/). These formulas were obtained from the seminal articles about the methods (the articles are listed below). We are not using Monte Carlo methods and, at present, are not computing z-scores using permutation (conditional randomization). Our z-score calculations are based on the randomization null hypothesis (theoretical distribution)�?� they are not based on simulation or permutation. P-values have a one to one correspondence with z-scores (i.e., a z-score of + or - 1.96 will always equate to a p-value of 0.05). Our tools calculate z-scores and then translate those z-scores to p-values. Our tools report both z-score and p-value results. Our empirical tests support the seminal work on Gi* by Getis and Ord who, in their 1992 paper, show that the statistic is asymptotically normal. Z-Scores do have a normal distribution so often people will ask us if it is valid to run Hot Spot Analysis (Gi*) on data that is skewed. The answer is yes, as long as the threshold distance you use is not too small or too large. How do we know? We start with very skewed data sets (like crime counts) and then compare the calculated p-values, based on the asymptotic z-scores, to the pseudo p-values obtained from permutations (conditional randomization). We found that for as low as 16 neighbors the asymptotic results provided the same significance as the permutations did over 99.9% of the time. We tested this on over 10 different skewed data sets, including mixed discrete/continuous models. In Anselin�??s article (citation below, page 99), the mathematics for calculated z-scores based on the randomization null hypothesis is given (equations 13, 14, and appendix A). The author indicates that a test for significant local spatial association may be based on these equations, but notes that the exact distribution is unknown. He suggests a conditional randomization alternative. Our empirical testing confirms that the permutation approach will be more accurate for this statistic when data is skewed; the Local Moran�??s I statistic does not appear to be asymptotically normal. We have already begun the development work to compute z-scores using permutation and will put this functionality in to the next release of ArcGIS. With the 10.1.2 release of ArcGIS we added a False Discovery Rate (FDR) p-value correction. We still report the uncorrected z-scores and p-values but use the correction to account for multiple testing and spatial dependency. For more about the FDR correction, please see: http://resources.arcgis.com/en/help/main/10.2/#/What_is_a_z_score_What_is_a_p_value/005p00000006000000/ Here are some additional resources: �?� 1992 Getis and Ord paper: http://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1992.tb00261.x/abstract �?� 1995 Ord and Getis paper (this is the version of the Gi* we implement): http://onlinelibrary.wiley.com/doi/10.1111/j.1538-4632.1995.tb00912.x/abstract �?� Seminal Anselin paper used as a basis for our Cluster and Outlier Analysis tool: Anselin, Luc. �??Local Indicators of Spatial Association �?? LISA.�?� Geographical Analysis Vol 27, no 2 (April 1995): 93-115. �?� Very good article about FDR: Caldas de Castro, Marcia, and Burton H. Singer. "Controlling the False Discovery Rate: A New Application to Account for Multiple and Dependent Test in Local Statistics of Spatial Association." Geographical Analysis 38, pp 180-208, 2006. Please let me know if I have not answered your question. Best wishes, Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics

‎01-28-2014

Hi Susanne, Yes, you are correct... if you want to compare crime from one time period to another for the same location, it is important to use the same distance band. Please keep in mind that results from Hot Spot Analysis are correct for whatever distance band you use... When you don't have any criteria to help you select any particular distance band, you can use Incremental Spatial Autocorrelation, Calculate Distance Band from Neighbors and/or Optimized Hot Spot Analysis to find an appropriate distance band for your analysis. These are some of the strategies I would try if I had several years of crime data and wanted to compare the hot spot (and Global Moran's I) results: 1) I would use Optimized Hot Spot Analysis (OHSA) to find the optimal distance for each year and I would write down the result. Or for 10.1, I would use Incremental Spatial Autocorrelation (ISA). Suppose I see the following: Year "Optimal" Distance from OHSA or ISA ----- ------------------------------------------ 2004 2301.345 2005 4043.223 2006 2290.456 2007 2310.987 2008 2301.842 Because most years have distances around 2300, if that distance seems to fit the scale of the question I'm asking, I would use that as the distance band every year (even for 2005). 2) If the distances above are all over the place, I would create a single feature class with crimes from all years and run OHSA (or ISA) on all crimes and use whatever distance it returns consistently when I do my year by year analyses. 3) The best solution (not always possible) is to have a reason for selecting your distance band... if remediation/crime prevention will be neighborhood by neighborhood, for example, I might try to come up with a distance that best reflects neighborhood structure in my study area... or perhaps I could try to find theory or evidence to tell me the distances over which related crimes occur ??? Sensitivity Analysis: Your goal is to make sure your model isn't over fit and that it predicts well across data samples. When a model is over fit, you will get a very different result by removing just a few observations. Here one strategy (there certainly are others) to help you feel confident that you've found a trustworthy model: 1) Find a model for your full data set. 2) Randomly sample 50% of the data and make sure that when you apply the model in (1) to both 50% samples that you still have a properly specified model (a properly specified model is one that meets all of the assumptions of the OLS method). I hope this helps! Best wishes, Lauren Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics

‎01-23-2014

If you have ArcGIS 10.2, you can try the new Optimized Hot Spot Analysis tool. To constrain your analysis to the road network you can: 1) Create a buffer around the road network and use this as your Bounding Polygon when you run Optimized Hot Spot Analysis (you might have to experiment a bit with different buffer widths). 2) Run Optimized Hot Spot Analysis on your accident point data and use the buffer polygon for the parameter called: Bounding Polygons Where the Incidents Are Possible. This will aggregate the events into fishnet polygons along the road network. It will automatically identify an appropriate scale of analysis and will apply an FDR correct to results. Keep in mind that this answers the question: where are there lots of accidents. It does not take into account differences in the density of the road network or traffic volumes. There are no doubt many ways to analyze your data; I hope this suggestion is helpful. Best wishes! Lauren Lauren M. Scott, PhD Esri Geoprocessing, Spatial Statistics

‎01-23-2014

Polermo, Another strategy 🙂 1) Make sure your polygon features have an ID field (so every polygon has a unique ID). Then do a Spatial Join on the points (Target = points, Join Features = polygons) to add that ID value to each point. Now each point "knows" which polygon it is in. 2) Run the Mean Center tool on the points. Use the polygon ID associated with each point as the Case Field. This will create a centroid for each polygon weighted by the points inside. I hope I have correctly understood your objectives. Best wishes! Lauren Lauren M Scott, PhD Esri Geoprocessing and Spatial Statistics

‎01-15-2014

Hi Susanne, Both the Spatial Autocorrelation (Global Moran�??s I) and the Hot Spot Analysis (Getis-Ord Gi*) tools are asymptotically normal so you do not need to transform your variables as long as you select a distance band that will ensure every feature has at least a few neighbors and none of the features have all other features as neighbors. If you have ArcGIS 10.2 or later, you can let the Optimized Hot Spot Analysis (OHSA) tool find an optimal distance value for you. Run OHSA on your polygons using your crime counts or ratios as your Analysis Field. Lots of information is written to the Results Window including what the tool identified as an optimal distance band (please see the second paragraph in this tool doc for more information: http://resources.arcgis.com/en/help/main/10.2/#/How_Optimized_Hot_Spot_Analysis_Works/005p00000057000000/ ). Use the same distance OHSA finds to be optimal when you run Spatial Autocorrelation. If you have an earlier version of ArcGIS, please let me know and I will send the instructions for finding an appropriate distance band. For Exploratory Regression I usually only transform variables if I�??m seeing curvilinear relationships... but it sometimes also helps if I'm having trouble finding an unbiased model. OLS regression does not require you to have normally distributed dependent or explanatory variables. It DOES require you to have normally distributed unbiased model residuals. If Exploratory Regression finds passing models, you can be confident you have found a model that meets all of the requirements of the OLS method. Whenever I use Exploratory Regression to find my properly specified model, however, I will want to: �?� Make sure all my candidate explanatory variables are supported by theory, or at least make sense or are supported by experts in the field. �?� Run a sensitivity analysis to make sure my model is not over fit. There are a number of ways to do this. One way is to randomly divide your data into two parts. Find your model using half the data, and then make sure the model is still valid for the other half of the data (valid meaning that it meets all the requirements of the OLS method). Here are some resources that may be helpful: http://resources.arcgis.com/en/help/main/10.2/#/Regression_analysis_basics/005p00000023000000/ (especially the section called Regression Analysis Issues) http://resources.arcgis.com/en/help/main/10.2/index.html#//005p00000053000000 I hope this helps! Very best wishes with your research, Lauren Lauren M Scott, PhD Esri Geoprocessing, spatial analysis, spatial statistics

Online Status	Offline
Date Last Visited	‎11-11-2020 02:23 AM

My Ideas

Latest Contributions by LaurenScott

Re: Spatial statistics -- how best to define the study area, and use of Getis-Ord Gi*

Re: Spatial statistics -- how best to define the study area, and use of Getis-Ord Gi*

Re: Suggestions for spatial analysis

Re: Hot Spot Analysis Zero Variance Error

Re: Space Time Hotspot analysis

Re: Space Time Hotspot analysis

Re: Space Time Hotspot analysis

Re: Space Time Hotspot analysis

Re: Space Time Hotspot analysis

Re: Space Time Hotspot analysis

Re: How are p-value for Anselin LISA and Getis-Ord's G statistics computed?

Re: General questions relating to spatial statistics

Re: Spatial Statistics for roads

Re: Calculating the centroid of each polygoin, but weighted by points from another la

Re: General questions relating to spatial statistics

Re: Row Standardisation Questions

Re: Spatial statistics -- how best to define the s...

Re: Suggestions for spatial analysis

Re: Spatial Statistics for roads

Re: Hot Spot Analysis Zero Variance Error