Tweets in ArcGIS Pro

3730
0
11-19-2017 01:14 PM
Labels (2)
RiccardoKlinger2
Occasional Contributor
3 0 3,730

The twitter API provides access not only to the timeline of users or for creating and publishing tweets but also to collect tweets in a certain area or with defined search items.

In this post I would like to show you, how to collect tweets in real time as well as "historic" tweets. By historic we need to state that it is not possible to access tweets older then 2 weeks by default.

The Prerequisites

First of all we will fetch tweets using Python and the tweepy library. To use the tweepy library we will embed this as a Python toolbox into ArcGIS Pro. Additionally you will need a twitter account and an app with keys.

Installing Tweepy

Unfortunately Tweepy is not listed in the ArcGIS Pro Python Package Manager. But the Tweepy library is hosted on git and can be either installed from source as well as by using pip/easy install. In the latter you simply type

pip install tweepy‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

In the former, clone the repo and install install it:

git clone https://github.com/tweepy/tweepy.git
cd tweepy
python setup.py install‍‍‍

I would recommend to use the pip-install as pip also enables you to install other libraries by this beautiful single line and keeps track of all Prerequisites needed for installing a library.

Getting the Keys

To get the needed keys to authenticate your application you will need to create an app at the developer section of Twitter.

This short video explains how:

Creating the Toolbox

To create a Python Toolbox is quite simple: Click on the "Toolbox" icon within the "Insert" ribbon and select "New Python Toolbox".

Once you've done this, you're able to customize the toolbox regarding the needs of the workflow.

The Python Logic

First: let us create the inputs for our tweet collecting nightmare. As the code needs keywords, the API keys and some more inputs we should decide whether some parameters should be able to customize on the frontend or not. The list of parameters is as follows:

  • keywords
  • extent
  • switch for historic/live tweets
  • location type (physical location or "place")
  • output feature class name
  • API keys from the created Twitter app

All the parameters could be part of the GUI but in fact the GUI will be connected to a certain APP-id on the twitter account and keys will not change on a regular basis. Therefore we will design the parameter section (function getParameterInfo) of our Python-toolbox as follows:

def getParameterInfo(self):
    '''Define parameter definitions'''
    hashtags = arcpy.Parameter(
        displayName='Search String',
        name='hashtags',
        datatype='GPString',
        parameterType='Optional',
        direction='Input')
    out_feature = arcpy.Parameter(
        displayName='Output Point Feature Class Name',
        name='out_feature',
        datatype='GPString',
        parameterType='Required',
        direction='Output')
    Extent = arcpy.Parameter(
        displayName='Extent',
        name='Lat',
        datatype='GPExtent',
        parameterType='Optional',
        direction='Input')
    locationType = arcpy.Parameter(
        displayName='Location Type',
        name='locType',
        datatype='GPString',
        parameterType='Required',
        direction='Input')
    locationType.filter.type = 'ValueList'
    locationType.filter.list = ['user location', 'place location']
    locationType.value = locationType.filter.list[0]
    collType= arcpy.Parameter(
        displayName='Collection Type',
        name='colType',
        datatype='GPString',
        parameterType='Required',
        direction='Input')
    collType.filter.type = 'ValueList'
    collType.filter.list = ['historic', 'real time']
    collType.value = collType.filter.list[0]
    numberOfTweets= arcpy.Parameter(
        displayName='Number of Tweets',
        name='numberOfTweets',
        datatype='GPLong',
        parameterType='required',
        direction='Input')
    numberOfTweets.value = 100
    timeForTweets= arcpy.Parameter(
        displayName='max. duration  of realtime stream',
        name='Duration',
        datatype='GPLong',
        parameterType='required',
        direction='Input')
    timeForTweets.value = 60 #the time to wait for new tweets
    params = [hashtags, out_feature, Extent, locationType, collType, numberOfTweets, timeForTweets]
    return params‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Unfortunately we need to keep in mind that the streaming of tweets is only possible either by keywords or by location. Therefore we will validate our parameters:

def updateParameters(self, parameters):
    """Modify the values and properties of parameters before internal
    validation is performed. This method is called whenever a parameter
    has been changed."""
    if parameters[0].valueAsText and parameters[4].value=="real time":
        if parameters[2].value: #extent was set!
            parameters[0].value="" #use no keywords!
    return‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

As we do have the needed inputs we can have a look on the Tweepy magic.

Tweepy Authentication

As the plugin needs to work with Tweepy we will first check, whether or not the library is installed and:

def execute(self, parameters, messages):
    """The source code of the tool."""
    try:
        import tweepy
    except:
        arcpy.AddError("Tweepy was not found!")
    return‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

After the check is successful we can go ahead and authenticate ourselves against Twitter:

#setting the authentication:
consumerKey = "set your key here"
consumerSecret = "set your key here"
accessToken = "set your key here"
accessTokenSecret = "set your key here"
key = tweepy.OAuthHandler(consumerKey ,consumerSecret)
key.set_access_token(accessToken, accessTokenSecret)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍        
api = tweepy.API(key, wait_on_rate_limit=True,wait_on_rate_limit_notify=True)‍‍‍ #access the API‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Now we can access the Twitter API.

Some Arcpy Helper Functions

But as we are working with spatial data we might need some helping functions prior getting the tweets. As We will store the data in a feature class we will need at least a feature class creator as well as an add feature to feature class function. We will define them by ourselves:

def createFC(name):
      sr = arcpy.SpatialReference(4326)
      arcpy.CreateFeatureclass_management(arcpy.env.workspace, name, 'POINT',"", "", "", sr)
      arcpy.AddField_management(name, "username", "TEXT", "", "", 255, "username", "NON_NULLABLE", "REQUIRED")
      arcpy.AddField_management(name, "tweet", "TEXT", "", "", 255, "tweet", "NON_NULLABLE", "REQUIRED")
      arcpy.AddField_management(name, "time", "DATE", "", "", "", "time", "NON_NULLABLE", "REQUIRED")
      arcpy.AddField_management(name, "place", "TEXT", "", "", 255, "place_name", "NULLABLE", "NON_REQUIRED")
      arcpy.AddField_management(name, "id", "TEXT", "", "", 255, "id", "NON_NULLABLE", "REQUIRED") #unfortunately ids of tweets are veryyy long integers
      return‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Now the add feature function:

def insertRecord(tuple, name):
      import os
      cursor = arcpy.da.InsertCursor(arcpy.env.workspace + os.sep + name,['username', 'tweet', 'time', 'place', 'id', 'SHAPE@XY'])
      try: 
          cursor.insertRow(tuple)
      except Exception as e:
          arcpy.AddError(e)
      del cursor
      return‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Now let's go back to the main function: we will create a new feature class with a distinct name by using the time stamp:

#create a featureClass:
import time
name = parameters[1].value + str(time.time()).split('.')[0] # we will only use the seconds since 01.01.1970
createFC(name)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

As the feature class is ready to receive the data, let's apply the Tweepy logic.

Reading Historic Tweets

The main difference in our toolbox is whether or not we will apply a historic or real time view. Therefore we will first look at the historic part:

if parameters[4].value == "historic":
    arcpy.AddMessage("start: collecting historic tweets")
    tweetsPerQry = 100 # that is the maximum possible
    tweetCount = 0 
    max_id = 0 # here is the id of the oldest tweet of the number of recieved tweets stored
    while tweetCount <= parameters[5].value:
        try:
            tweetInResponse = 0
            if (max_id <= 0):
                new_tweets = api.search(q=str(parameters[0].value), count=tweetsPerQry, geocode=geo)
            else:
                new_tweets = api.search(q=str(parameters[0].value), count=tweetsPerQry, geocode=geo, max_id=str(max_id - 1))
            max_id = new_tweets[-1].id # we will update the id number with the id of the oldest tweet
            for tweet in new_tweets:
                tweetInResponse += accessTweet(tweet, parameters[3].value, tweetCount, name)
        except: 
            arcpy.AddError("no other tweets found!")
            tweetCount += 1
            break
        tweetCount += tweetInResponse‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

In line 10 and 12 we are evaluating the extent provided. So the extent will be provided with XMin, XMax, YMin and YMax values if you not select some layer or defined an input. The search API uses a "Lat,Lon,radius" approach. So we need to work with the input:

if parameters[2].value:
    rectangle = [[parameters[2].value.XMin,parameters[2].value.YMin],[parameters[2].value.XMin,parameters[2].value.YMax],[parameters[2].value.XMax,parameters[2].value.YMax],[parameters[2].value.XMax,parameters[2].value.YMin]]
    extent=arcpy.Polygon(arcpy.Array([arcpy.Point(*coords) for coords in rectangle])) #create a polygon from the extent
    arcpy.AddMessage("search in a region!")
    LL = arcpy.PointGeometry(arcpy.Point(parameters[2].value.XMin, parameters[2].value.YMin),arcpy.SpatialReference(4326))
    UR = arcpy.PointGeometry(arcpy.Point(parameters[2].value.XMax, parameters[2].value.YMax),arcpy.SpatialReference(4326))
    radius=UR.angleAndDistanceTo(LL, method="GEODESIC")[1]/2000 # describes a circle from LL to UR with radius half the size of inputs
    geo=str(extent.centroid.Y) + "," + str(extent.centroid.X) + "," + str(radius) + "km"
else :
    arcpy.AddMessage("worlwide search!")  
    geo=""‍‍‍‍‍‍‍‍‍‍‍

As you can see in line 15 of the "historic" logic, we will call a special function as all tweets we are gathering contain some sort of spatial information:

  • the location of the device
  • the place the user tagged when creating the tweet
  • the place a user defined in his profile

Most of the tweets contain only the profile location which is not very helpful in our use case. So we will loop through the received results (approx. 100) and insert each tweet in our feature class if it has place or location information:

def accessTweet(inTweet, locationType, resultingNumbers, name):
#tweets have three types of location: user, place, account. we are just interested in the first two.
    from datetime import datetime
    numberIncreaser = 0
    if locationType == "place location":              
        if inTweet.place != None:
        #places are displayed with bounding boxes:
            tweetTuple = (inTweet.user.name, inTweet.text, inTweet.created_at.strftime('%Y-%m-%d %H:%M'), inTweet.place.full_name, str(inTweet.id),((inTweet.place.bounding_box.coordinates[0][2][0] + inTweet.place.bounding_box.coordinates[0][0][0]) / 2, (inTweet.place.bounding_box.coordinates[0][2][1] + inTweet.place.bounding_box.coordinates[0][0][1]) / 2))
            insertRecord(tweetTuple, name)
            numberIncreaser = 1
    if locationType == "user location":         
        if inTweet.coordinates != None:
            #places are displayed with bounding boxes:
            tweetTuple = (inTweet.user.name, inTweet.text, inTweet.created_at.strftime('%Y-%m-%d %H:%M'), "device coordinates", str(inTweet.id),(inTweet.coordinates['coordinates'][0], inTweet.coordinates['coordinates'][1]))
            insertRecord(tweetTuple, name)
            numberIncreaser = 1
    return numberIncreaser‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The numberIncreaser will tell our main function whether the tweet was a desired tweet according to our search parameters and will also insert the tweet in our feature class.

Real Time Tweets

We will use the same functions when searching for real time tweets. Yet the logic is a bit different:

if parameters[4].value == "real time":
    arcpy.AddMessage("start: collecting real time tweets")
    start_time = time.time() #start time
    class stream2lib(tweepy.StreamListener):
        def __init__(self, api=None):
            #api = tweepy.API(key)
            self.api = api
            self.n = 0
        def on_status(self, status):
            if status.geo != None and parameters[3].value == 'user location':
                self.n = self.n+1
                arcpy.AddMessage(str(self.n) + " tweets received...")
                arcpy.AddMessage(str(time.time() - start_time) + "s from " + str(parameters[6].value) + "s")
                accessTweet(status, parameters[3].value, self.n, name)
            if status.place != None and parameters[3].value == 'place location':
                self.n = self.n+1
                arcpy.AddMessage(str(self.n) + " tweets received...")
                arcpy.AddMessage(str(time.time() - start_time) + "s from " + str(parameters[6].value) + "s")
                #arcpy.AddMessage(status)
                accessTweet(status, parameters[3].value, self.n, name)
            if self.n >= parameters[5].value:
                arcpy.AddMessage("Desired number of tweets collected!")
                return False
            if (time.time() - start_time) >= parameters[6].value:
                arcpy.AddMessage("Time limit of " + str(parameters[6].value) + "s reached!" )
                return False
            if self.n < parameters[5].value:    
                return True
    stream = tweepy.streaming.Stream(key, stream2lib())
    if parameters[2].value:
        stream.filter(locations=[parameters[2].value.XMin,parameters[2].value.YMin,parameters[2].value.XMax,parameters[2].value.YMax])
    else:
        stream.filter(track=[parameters[0].value]) ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

In the end you can gather tweets for your very own purpose but keep in mind: around 0.1% have some proper location information and the Tweepy API does not provide access to the firehose but to 1% of the full Twitter stream. Also keep in mind that you might collect also bots with the same location for every tweet (two radio stations are tweeting their playlist in Berlin ).

But you might get some really cool visuals:

If you want to work with the toolbox grab it from git. You can also work with the static version attached here.

Labels