Week 3 of Civic Analytics with Hub (correlation and clustering)

1037
3
07-28-2020 08:20 AM
ManushiMajumdar
Esri Contributor

This week we focus on the attributes of a dataset to understand how relationships between attributes can be detected and interpreted. We also extend that understanding further to spot hidden patterns in our data.

In the first example we fetch neighborhood boundaries for Washington, DC to observe correlation in socioeconomic factors. We enrich the neighborhoods layer with a few socioeconomic variables such as, variables for Population, Median Household Income, Households below poverty levels, to name a few. We then display the data as a scatter matrix - a collection of scatter plots - which compare the relation of each numerical variable with the other to see if changes in one variable reflect as changes in the other variable in some way. Having obtained a visual understanding of these correlated variable pairs, we then use statistical tests from the scipy (Scientific Python) library of Python to numerically compute this correlation for a few variable pairs.

The second notebook demonstrates two different techniques of detecting clusters or patterns in data. We begin by fetching data for rodent inspection and treatment sites in Washington, DC for the last 30 days to detect point clusters if any, which helps inform strategies for follow-up treatments and inspections. The second example we look at checks to see if neighborhoods within the city of Tucson can be grouped together based on similarities in income variables. We read in data and then extract variables of interest in a separate dataframe. This data is used as the input for the KMeans unsupervised learning method from the scikit-learn library of Python. This helps us detect neighborhood clusters that exhibit similarity in our variables of choice.

0 Kudos
3 Replies
FATEYEB
New Contributor

AO_100270 Error when using EnrichLayer

I ran this code to enrich a layer and got this error message below. How may I reset the max row size for feature service layer?


{"messageCode": "AO_100270", "message": "The size and number of the variables that you selected exceeds the maximum row size for feature service layer. Please reduce the number of selected variables."}

ProcessFeatureOutput failed. Error: {"code" : 0, "messageCode":"GPEXT_017","message": "Service Enriched neighborhoods of DC 2019 already exists.", "params": {"name" : "Enriched neighborhoods of DC 2019"}}

{"messageCode": "AO_100020", "message": "EnrichLayer failed."} Failed to execute (EnrichLayer). Failed

0 Kudos
ManushiMajumdar
Esri Contributor

Thanks for reaching out with your question. The second error message indicates that the particular output layer already exists for you. If you have a Feature Layer or Feature Service with the same name already existing in your account, it will give you an error and you may have to rename your output layer in the script, to recreate it. To see other inut parameters for this method, here is the ArcGIS API for Python method details.

Also, are you running this notebook script as is? Or have you used another input layer and other enrich variables?

0 Kudos
FATEYEB
New Contributor

Manushi-

Thank you! I changed the name and it worked. 

I initially ran the notebook script as is. Then I used other variables for another locale (Michigan). I uploaded polygon data for census tracts based on 2010 Census Shapefiles. This however returned other errors. The code is below. 

My follow up question is - should I change the polygon data if Census tracts is not supported by "variables from global data collections"?

I would be grateful for any insight (or suggested coursework in ESRI that can enable me learn even more).

Thanks

Babasola

 from arcgis.features.enrich_data import enrich_layer

population_2019 = ['TOTPOP_CY', 'POPDENS_CY']

enriched = enrich_layer(MItracts_layer, analysis_variables=population_2019, output_name='Population density of Census Tracts in Michigan')·         enriched

{"messageCode": "AO_100047", "message": "Enrichment may not be available for some features."}

{"messageCode": "AO_100000", "message": "Country aggregation mode supports only variables from global data collections. Following variables couldn't be processed: 'POPDENS_CY', 'TOTPOP_CY'."}

0 Kudos