I analyze the spatial distribution of crime and the connections between crime and social factors. I work with 610 polygon database.Now I would like to apply spatial statistics, but I have a little bit complex problem, which might derive for my less experiences in this field... 1. I'm going to perform spatial autocorrelation (Global Moran I) on my crime data. My crime data (it is expressed by a rate for 100 000 inhabitants) are strongly (positive) skewed. I've read, that it is problem for Moran I, so i transformed my data with lg10 (transformed crime data=lg10(crime data+1). Is it actually more reliable so, or not necessary? 2. I'd like to apply the hot spot analysis as well. My second question: Is this tool also sensible for normal distribution of data? If the answer is yes, I'm going to use my transformed crime data. But here comes the main problem: I would like to analyse that what kind of social factros cause the distribution of crime, so I'd like to analyze the connection between social factors and crime with Exploratory Regression. When I began my analysis I draw the scatterplots (of crime data without tranfromation and social factors) and I saw linear connections, but when I use the transformed data, these connections are dissappeard. So if I applied my original variables in the Exploratory Regression (and then in GWR), would it "express" or "mirror" the distribution of transformed crime data (the results of Hot spot anlayis)? If the answer is not, what steps would you recommend me?
I'd like to say thank you very much for your help! Susanne
Hi Susanne, Both the Spatial Autocorrelation (Global Moran�??s I) and the Hot Spot Analysis (Getis-Ord Gi*) tools are asymptotically normal so you do not need to transform your variables as long as you select a distance band that will ensure every feature has at least a few neighbors and none of the features have all other features as neighbors.
If you have ArcGIS 10.2 or later, you can let the Optimized Hot Spot Analysis (OHSA) tool find an optimal distance value for you. Run OHSA on your polygons using your crime counts or ratios as your Analysis Field. Lots of information is written to the Results Window including what the tool identified as an optimal distance band (please see the second paragraph in this tool doc for more information: http://resources.arcgis.com/en/help/main/10.2/#/How_Optimized_Hot_Spot_Analysis_Works/005p0000005700... ). Use the same distance OHSA finds to be optimal when you run Spatial Autocorrelation.
If you have an earlier version of ArcGIS, please let me know and I will send the instructions for finding an appropriate distance band.
For Exploratory Regression I usually only transform variables if I�??m seeing curvilinear relationships... but it sometimes also helps if I'm having trouble finding an unbiased model. OLS regression does not require you to have normally distributed dependent or explanatory variables. It DOES require you to have normally distributed unbiased model residuals. If Exploratory Regression finds passing models, you can be confident you have found a model that meets all of the requirements of the OLS method.
Whenever I use Exploratory Regression to find my properly specified model, however, I will want to: �?� Make sure all my candidate explanatory variables are supported by theory, or at least make sense or are supported by experts in the field. �?� Run a sensitivity analysis to make sure my model is not over fit. There are a number of ways to do this. One way is to randomly divide your data into two parts. Find your model using half the data, and then make sure the model is still valid for the other half of the data (valid meaning that it meets all the requirements of the OLS method).
Thank you very much for your help. Two further questions have emerged relating to your answer.
1. You wrote that I have to find the optimal distance band. (In the former version of ArcGIS (10.1) I used Incremental Spatial Autocorrelation and Calculate Distance Band from Neighbor Count tools to calculate this distance). But what happen if I want to compare crime data of different years? (I would like to compare the Moran I values and Hot Spot maps of different years) If I understand well, the optimal distance depends on my crime data, so there will be different optimal distance bands. Can I compare so the years? Doesn't it require the same parameters?
2. Sensitivity analysis: if I understand well, one way of performing it to reduce my polyogons (610->305) and run the calculations again? If a model is appropriate for the half sample area, it is appropriate for the whole.
Hi Susanne, Yes, you are correct... if you want to compare crime from one time period to another for the same location, it is important to use the same distance band. Please keep in mind that results from Hot Spot Analysis are correct for whatever distance band you use... When you don't have any criteria to help you select any particular distance band, you can use Incremental Spatial Autocorrelation, Calculate Distance Band from Neighbors and/or Optimized Hot Spot Analysis to find an appropriate distance band for your analysis. These are some of the strategies I would try if I had several years of crime data and wanted to compare the hot spot (and Global Moran's I) results: 1) I would use Optimized Hot Spot Analysis (OHSA) to find the optimal distance for each year and I would write down the result. Or for 10.1, I would use Incremental Spatial Autocorrelation (ISA). Suppose I see the following:
Year "Optimal" Distance from OHSA or ISA ----- ------------------------------------------ 2004 2301.345 2005 4043.223 2006 2290.456 2007 2310.987 2008 2301.842
Because most years have distances around 2300, if that distance seems to fit the scale of the question I'm asking, I would use that as the distance band every year (even for 2005).
2) If the distances above are all over the place, I would create a single feature class with crimes from all years and run OHSA (or ISA) on all crimes and use whatever distance it returns consistently when I do my year by year analyses.
3) The best solution (not always possible) is to have a reason for selecting your distance band... if remediation/crime prevention will be neighborhood by neighborhood, for example, I might try to come up with a distance that best reflects neighborhood structure in my study area... or perhaps I could try to find theory or evidence to tell me the distances over which related crimes occur ???
Sensitivity Analysis: Your goal is to make sure your model isn't over fit and that it predicts well across data samples. When a model is over fit, you will get a very different result by removing just a few observations. Here one strategy (there certainly are others) to help you feel confident that you've found a trustworthy model:
1) Find a model for your full data set. 2) Randomly sample 50% of the data and make sure that when you apply the model in (1) to both 50% samples that you still have a properly specified model (a properly specified model is one that meets all of the assumptions of the OLS method).
I hope this helps! Best wishes, Lauren
Lauren M Scott, PhD Esri Geoprocessing, Spatial Statistics