What statistical tools would be good for visual representation of non-numeric data? For example, I have data different crimes committed throughout the City of Boston. What would be good ways to represent different types of crimes committed in each neighborhood with spatial statistics, spatial analyst or geostatistics tools?
I've done a density analysis, showing crimes per 1,000 residents and crimes per square mile. I've also done a point density raster, and have show the mean center of all crimes. These aren't quite satisfying, though. I'm looking for something more.
A big problem is that a lot of the statistical tools want to use attribute data that is numerical. I could theoretically apply a number to each type of crime, say 1 for burglaries, 2 for arson, etc., but this is all made up. There is no inherent numerical value to any given type of crime.
So I appreciate any thoughts you all have.
Dont be tempted by conversion...a numeric category is the same as a text category data. You have nominal data which has a spatial location leaving you with counts, counts per unit area (ie by neighborhood). Some of the areal statistics test could assess whether there was any clustering (but be aware of the statistical assumptions of those appropriate to interval/ratio data). Joins count statistics (not implemented directly in ArcMap) is appropriate for such a task and some may argue Moran's.
With the point density tool, I presume you don't have the actual location of the crimes so measures of mean center should be considered with caution. You could produce ranks of some of your density data and map it.
I would make sure that you avoid the temptation to apply the "latest" in terms of spatial mapping and ensure that the data you do have and the statistical methods you employ conform to the underlying assumptions of your data and just because something looks significant doesn't mean it is
Thanks for the reply, Dan. I'm only in my 2nd GIS course, so I am still just learning a lot of these tools. I actually do have the exact locations of the incidents, as they fortunately come with latitude and longitude information. Would that make a raster kind of silly?
I will run Moran's, although I'm curious about what distances would be logical. I've also used the Average Nearest Neighbor tool, but I am only guessing at what number of neighbors would be appropriate. I used 10 as a starting point, but I've got almost 6,000 data points spread out over about 50 square miles.
Any suggestions for number of neighbors or clustering band distances to use?
I appreciate the help.
Even though you have the positional coordinates of the incidents, it doesn't mean that they could have occurred randomly in space...which is often an underlying assumption of many spatial stats tests.
for Morans, Global Morans might be a consideration...in its classical sense, it looks at adjacency of polygonal data weighted by the attribute with and not on the actual distances between point locations and/or centroids of polygons (ie contiguity edges only and variants) just this one test alone could be applied to your data to assess the model choice upon the outcomes.
I am not recommending one particular test or another, but for you to be aware that the nature of the data, (ie whether occurrence is spatially random, the normalacy of the attributes etc etc) and the techniques chosen and options within can be used to guide you in your work. They can also be used in the classical sense "how to lie with statistics" and "how to lie with maps" So focus less on the outcomes and focus more on how the methodology affects the outcomes.
I will let those more inclined to recommend tools to explore to weigh-in in light of the suggestions I have provided.
You can definitely have some fun analyzing your crime data You will have to convert the categorical data to counts, or proportions, but you can still learn a lot about how the different types of crimes relate to each other. Here are some ideas:
1) Use Optimized Hot Spot Analysis. It will overlay your study area with a fishnet grid and count the number of crimes (of all types) that fall within each grid cell, then it will perform hot spot analysis to show you the hot and cold spot areas. This answers the question: where are the hot and cold spots of (all) crime? However, you can also run Optimized Hot Spot Analysis on different types of crime, then visually compare the hot spot maps.
2) You can use the fishnet grid cells from (1) or else use census tracts and count the number of unique crime types within each polygon to see where you have the highest crime diversity (you might find that some places only have robberies, for example, while other places experience a wide variety of different crime types)... you can run hot spot analysis on the diversity counts.
3) For fishnet grid cells (output from Optimized Hot Spot Analysis, then use Spatial Join) or census tracts, count the number of assaults, the number of robberies, the number of auto thefts, etc. within each polygon, then convert those counts to a percentage of all crime. You can then run grouping analysis to find polygons with similar challenges ... one group might be high for assault and narcotics, for example, but low for robbery... knowing the profiles -- the specific challenges -- of each group can help you identify effective prevention strategies.
4) If you have distinct clusters of particular crimes, you can create standard deviational ellipses around each cluster then overlay the ellipses for two different types of crimes to see how spatially integrated they are. (I'm doing an analysis right now that looks at violent crime in relation to alcohol establishments... to see how integrated those two "activity" spaces are).
5) Using the output from (1) you can find spatial outliers for all crime or for specific crime types: a high crime count area surrounded by low crime count areas, or a low crime count area surrounded by high crime count areas. These anomalies are often very interesting (what is that one neighborhood doing right... it is has no problem at all with narcotics while surrounding areas are high for drug related crimes? ... or why is this one neighborhood so high in relation to surrounding neighborhoods?)
6) Be sure to check out the new space time pattern mining tools in the 10.3 release as well: An overview of the Space Time Pattern Mining toolbox—ArcGIS Help | ArcGIS for Professionals
I'm sure others will have ideas as well. I hope this is helpful!