Select to view content in your preferred language

How do you find trends between attributes in multiple features?

2100
7
Jump to solution
08-24-2018 04:49 PM
AndySiegel
Regular Contributor

Hello,

I am working on analysis of American Community Survey (ACS) data for local city government. One portion of the analysis involves exploring potential childhood hunger issue. I have 3 layers:

  • Households receiving food stamps/SNAP (Census Block Group)
  • Median Household Income (Census Block Group)
  • Number of children under 18 years (Census Block Group)

Using these 3 layers together, what tools can I use to determine which Census Block Groups are of the the highest priority concerning childhood hunger? I figure this is some kind of overlay/statistical analysis, but not sure which ones will work for my needs. Thank you for any help you can provide.

Andy

0 Kudos
1 Solution

Accepted Solutions
DanPatterson_Retired
MVP Emeritus

Lots of suggestions... but a caution, 

Although tools will run on data, there are underlying assumptions that the tools don't check.

For instance, correlation and regression...

  • you shouldn't use it unless you know the underlying requirements before you apply the tool to the data. 

What if you find out what the assumption is, then perform an analysis of the distribution of the data (descriptive stats and tests of the distribution). 

  • What if the data aren't applicable to that 'test'? 
  • Do you abandon all hope and move on?
  • or do you move on to another test which doesn't have the same assumption? 
  • Do you transform your data until the distribution conforms (maybe it is the 5/4th root of income...)

This tome is just a cautionary tale about being swayed by the beauty and speed of a 'tool' or 'method of analysis' without accepting the fact that the results may be completely spurious because you didn't do your 'homework'

I have seen too many term projects that might even have been captured in books like this....

Spurious Correlations 

Sadly some of the examples ring too true.... proceed with caution in your analysis.... the 'pile' is deep enough

View solution in original post

7 Replies
DanPatterson_Retired
MVP Emeritus

For vector data look at intersect and union in 

An overview of the Overlay toolset—Help | ArcGIS Desktop 

depending on the attributes that you want in the tables.

The tools exist in ArcMap and ArcGIS Pro

AndySiegel
Regular Contributor

Thanks for the response Dan! To clarify, my issue is not getting all the data to appear in same census block groups (as I can do this via join), but that I want to find trend between the census block groups that have:

  • high number of households receiving food stamps
  • low median household income
  • high number of children under 18

I figure I could do this by manually choosing definition queries for each layer and running intersect to see where they are coincident. However, I think I'm looking for more of statistics approach to evaluate all three of the variables to show me where potential childhood hunger exists. Does that make sense from spatial analysis perspective? Thanks for the help!

0 Kudos
DanPatterson_Retired
MVP Emeritus

Andy, performing the basic intersects to get the data into a tabular form that you can work with.  Some of the variables may be self evident or non-causal.

I would drop the word "trend" since you are implying correlation.  If you have a look at the spatial intersections it might give you a start for where to gather real information, such as schools in those areas that have breakfast programs,... in-community support centers, food banks, church food programs etc.  A higher concentration of these might be indicators of where action is being taken to stave off inadequate nutrition.  Perhaps, these are NOT the areas to examine because actions are being done... perhaps it is the areas peripheral (maybe physically, or economically) that the issue might be addressed

MervynLotter
Frequent Contributor

Building on from what Dan suggested, what you have provided are three independent or explanatory variables (1 - high number of households receiving food stamps, 2 - low median household income, 3 - high number of children under 18). If you want to start exploring for statistical relationships then you need a feature layer with a dependent variable that varies in abundance. 

If you do have access to a layer with a dependent variable, then Ordinary Least Squares (OLS), Geographically Weighted Regression (GWR) or Forest-based Classification and Regression may be useful tools in exploring your data.

Do take a look at Regression analysis basics—ArcGIS Pro | ArcGIS Desktop for more information. 

AlfonsoYañez_Morillo
Regular Contributor

Andy,

Take a look on the Hot Spot Analysis. I think is exactly what you are looking for.

Hot Spot Analysis (Getis-Ord Gi*)

0 Kudos
DanPatterson_Retired
MVP Emeritus

Lots of suggestions... but a caution, 

Although tools will run on data, there are underlying assumptions that the tools don't check.

For instance, correlation and regression...

  • you shouldn't use it unless you know the underlying requirements before you apply the tool to the data. 

What if you find out what the assumption is, then perform an analysis of the distribution of the data (descriptive stats and tests of the distribution). 

  • What if the data aren't applicable to that 'test'? 
  • Do you abandon all hope and move on?
  • or do you move on to another test which doesn't have the same assumption? 
  • Do you transform your data until the distribution conforms (maybe it is the 5/4th root of income...)

This tome is just a cautionary tale about being swayed by the beauty and speed of a 'tool' or 'method of analysis' without accepting the fact that the results may be completely spurious because you didn't do your 'homework'

I have seen too many term projects that might even have been captured in books like this....

Spurious Correlations 

Sadly some of the examples ring too true.... proceed with caution in your analysis.... the 'pile' is deep enough

AndySiegel
Regular Contributor

Thank you all for the replies. I now have a better understanding of how to proceed analyzing and visualizing the data. As Dan and Mervyn mentioned, it's very important to consider the underlying assumptions. For the purpose of my project, which is mainly exploratory, I'm just going to use simple maps showing the layers with graduated symbols/colors. I don't think statistical analysis is necessary or appropriate for this portion of the project. Happy mapping everyone!

0 Kudos