Week 4 of Civic Analytics with Hub (text analysis)

ManushiMajumdar · ‎08-04-2020

In this notebook we work with the Vision Zero street safety survey from Washington, DC. This data is collected through a web application where the public can select a particular street segment and convey their concerns about its safety. We read this data in using the ArcGIS API for Python. Following that we use four Python libraries that assist us with the text processing and analysis - WordCloud, nltk, textblob and spacy. Using WordCloud, we first create a word cloud of the most popular words in the survey. We then import nltk to identify the words of high frequency and high relevance to the survey and regenerate a word cloud. This gives us a quick visual snapshot of what people are speaking about the most. Having done that we extract the most popular words mentioned, that suggest the topics of importance in the survey.

We proceed to calculate the sentiment score for each comment, ranging from -1 (negative sentiment) to +1 (positive sentiment) using textblob and visualize the distribution of the scores. This gives us a general sense of what the citizens feel about the safety of the streets. We also extract the top 10 positive and negative comments, based on their sentiment scores to get a sense of the comments with strong opinions. We then conclude with a final technique that uses spacy to identify the named entities or proper nouns mentioned in these comments and classify them to identify if they are names for a person, place, organization, etc. This is a useful technique to quickly extract the subject and focus of a comment.

Link to notebook - Exploratory text analysis of comments from surveys

Accompanying blog post