The twitter API provides access not only to the timeline of users or for creating and publishing tweets but also to collect tweets in a certain area or with defined search items.
In this post I would like to show you, how to collect tweets in real time as well as "historic" tweets. By historic we need to state that it is not possible to access tweets older then 2 weeks by default.
First of all we will fetch tweets using Python and the tweepy library. To use the tweepy library we will embed this as a Python toolbox into ArcGIS Pro. Additionally you will need a twitter account and an app with keys.
Unfortunately Tweepy is not listed in the ArcGIS Pro Python Package Manager. But the Tweepy library is hosted on git and can be either installed from source as well as by using pip/easy install. In the latter you simply type
pip install tweepy
In the former, clone the repo and install install it:
git clone https://github.com/tweepy/tweepy.git
cd tweepy
python setup.py install
I would recommend to use the pip-install as pip also enables you to install other libraries by this beautiful single line and keeps track of all Prerequisites needed for installing a library.
To get the needed keys to authenticate your application you will need to create an app at the developer section of Twitter.
This short video explains how:
To create a Python Toolbox is quite simple: Click on the "Toolbox" icon within the "Insert" ribbon and select "New Python Toolbox".
Once you've done this, you're able to customize the toolbox regarding the needs of the workflow.
First: let us create the inputs for our tweet collecting nightmare. As the code needs keywords, the API keys and some more inputs we should decide whether some parameters should be able to customize on the frontend or not. The list of parameters is as follows:
All the parameters could be part of the GUI but in fact the GUI will be connected to a certain APP-id on the twitter account and keys will not change on a regular basis. Therefore we will design the parameter section (function getParameterInfo) of our Python-toolbox as follows:
def getParameterInfo(self):
'''Define parameter definitions'''
hashtags = arcpy.Parameter(
displayName='Search String',
name='hashtags',
datatype='GPString',
parameterType='Optional',
direction='Input')
out_feature = arcpy.Parameter(
displayName='Output Point Feature Class Name',
name='out_feature',
datatype='GPString',
parameterType='Required',
direction='Output')
Extent = arcpy.Parameter(
displayName='Extent',
name='Lat',
datatype='GPExtent',
parameterType='Optional',
direction='Input')
locationType = arcpy.Parameter(
displayName='Location Type',
name='locType',
datatype='GPString',
parameterType='Required',
direction='Input')
locationType.filter.type = 'ValueList'
locationType.filter.list = ['user location', 'place location']
locationType.value = locationType.filter.list[0]
collType= arcpy.Parameter(
displayName='Collection Type',
name='colType',
datatype='GPString',
parameterType='Required',
direction='Input')
collType.filter.type = 'ValueList'
collType.filter.list = ['historic', 'real time']
collType.value = collType.filter.list[0]
numberOfTweets= arcpy.Parameter(
displayName='Number of Tweets',
name='numberOfTweets',
datatype='GPLong',
parameterType='required',
direction='Input')
numberOfTweets.value = 100
timeForTweets= arcpy.Parameter(
displayName='max. duration of realtime stream',
name='Duration',
datatype='GPLong',
parameterType='required',
direction='Input')
timeForTweets.value = 60 #the time to wait for new tweets
params = [hashtags, out_feature, Extent, locationType, collType, numberOfTweets, timeForTweets]
return params
Unfortunately we need to keep in mind that the streaming of tweets is only possible either by keywords or by location. Therefore we will validate our parameters:
def updateParameters(self, parameters):
"""Modify the values and properties of parameters before internal
validation is performed. This method is called whenever a parameter
has been changed."""
if parameters[0].valueAsText and parameters[4].value=="real time":
if parameters[2].value: #extent was set!
parameters[0].value="" #use no keywords!
return
As we do have the needed inputs we can have a look on the Tweepy magic.
As the plugin needs to work with Tweepy we will first check, whether or not the library is installed and:
def execute(self, parameters, messages):
"""The source code of the tool."""
try:
import tweepy
except:
arcpy.AddError("Tweepy was not found!")
return
After the check is successful we can go ahead and authenticate ourselves against Twitter:
#setting the authentication:
consumerKey = "set your key here"
consumerSecret = "set your key here"
accessToken = "set your key here"
accessTokenSecret = "set your key here"
key = tweepy.OAuthHandler(consumerKey ,consumerSecret)
key.set_access_token(accessToken, accessTokenSecret)
api = tweepy.API(key, wait_on_rate_limit=True,wait_on_rate_limit_notify=True) #access the API
Now we can access the Twitter API.
But as we are working with spatial data we might need some helping functions prior getting the tweets. As We will store the data in a feature class we will need at least a feature class creator as well as an add feature to feature class function. We will define them by ourselves:
def createFC(name):
sr = arcpy.SpatialReference(4326)
arcpy.CreateFeatureclass_management(arcpy.env.workspace, name, 'POINT',"", "", "", sr)
arcpy.AddField_management(name, "username", "TEXT", "", "", 255, "username", "NON_NULLABLE", "REQUIRED")
arcpy.AddField_management(name, "tweet", "TEXT", "", "", 255, "tweet", "NON_NULLABLE", "REQUIRED")
arcpy.AddField_management(name, "time", "DATE", "", "", "", "time", "NON_NULLABLE", "REQUIRED")
arcpy.AddField_management(name, "place", "TEXT", "", "", 255, "place_name", "NULLABLE", "NON_REQUIRED")
arcpy.AddField_management(name, "id", "TEXT", "", "", 255, "id", "NON_NULLABLE", "REQUIRED") #unfortunately ids of tweets are veryyy long integers
return
Now the add feature function:
def insertRecord(tuple, name):
import os
cursor = arcpy.da.InsertCursor(arcpy.env.workspace + os.sep + name,['username', 'tweet', 'time', 'place', 'id', 'SHAPE@XY'])
try:
cursor.insertRow(tuple)
except Exception as e:
arcpy.AddError(e)
del cursor
return
Now let's go back to the main function: we will create a new feature class with a distinct name by using the time stamp:
#create a featureClass:
import time
name = parameters[1].value + str(time.time()).split('.')[0] # we will only use the seconds since 01.01.1970
createFC(name)
As the feature class is ready to receive the data, let's apply the Tweepy logic.
The main difference in our toolbox is whether or not we will apply a historic or real time view. Therefore we will first look at the historic part:
if parameters[4].value == "historic":
arcpy.AddMessage("start: collecting historic tweets")
tweetsPerQry = 100 # that is the maximum possible
tweetCount = 0
max_id = 0 # here is the id of the oldest tweet of the number of recieved tweets stored
while tweetCount <= parameters[5].value:
try:
tweetInResponse = 0
if (max_id <= 0):
new_tweets = api.search(q=str(parameters[0].value), count=tweetsPerQry, geocode=geo)
else:
new_tweets = api.search(q=str(parameters[0].value), count=tweetsPerQry, geocode=geo, max_id=str(max_id - 1))
max_id = new_tweets[-1].id # we will update the id number with the id of the oldest tweet
for tweet in new_tweets:
tweetInResponse += accessTweet(tweet, parameters[3].value, tweetCount, name)
except:
arcpy.AddError("no other tweets found!")
tweetCount += 1
break
tweetCount += tweetInResponse
In line 10 and 12 we are evaluating the extent provided. So the extent will be provided with XMin, XMax, YMin and YMax values if you not select some layer or defined an input. The search API uses a "Lat,Lon,radius" approach. So we need to work with the input:
if parameters[2].value:
rectangle = [[parameters[2].value.XMin,parameters[2].value.YMin],[parameters[2].value.XMin,parameters[2].value.YMax],[parameters[2].value.XMax,parameters[2].value.YMax],[parameters[2].value.XMax,parameters[2].value.YMin]]
extent=arcpy.Polygon(arcpy.Array([arcpy.Point(*coords) for coords in rectangle])) #create a polygon from the extent
arcpy.AddMessage("search in a region!")
LL = arcpy.PointGeometry(arcpy.Point(parameters[2].value.XMin, parameters[2].value.YMin),arcpy.SpatialReference(4326))
UR = arcpy.PointGeometry(arcpy.Point(parameters[2].value.XMax, parameters[2].value.YMax),arcpy.SpatialReference(4326))
radius=UR.angleAndDistanceTo(LL, method="GEODESIC")[1]/2000 # describes a circle from LL to UR with radius half the size of inputs
geo=str(extent.centroid.Y) + "," + str(extent.centroid.X) + "," + str(radius) + "km"
else :
arcpy.AddMessage("worlwide search!")
geo=""
As you can see in line 15 of the "historic" logic, we will call a special function as all tweets we are gathering contain some sort of spatial information:
Most of the tweets contain only the profile location which is not very helpful in our use case. So we will loop through the received results (approx. 100) and insert each tweet in our feature class if it has place or location information:
def accessTweet(inTweet, locationType, resultingNumbers, name):
#tweets have three types of location: user, place, account. we are just interested in the first two.
from datetime import datetime
numberIncreaser = 0
if locationType == "place location":
if inTweet.place != None:
#places are displayed with bounding boxes:
tweetTuple = (inTweet.user.name, inTweet.text, inTweet.created_at.strftime('%Y-%m-%d %H:%M'), inTweet.place.full_name, str(inTweet.id),((inTweet.place.bounding_box.coordinates[0][2][0] + inTweet.place.bounding_box.coordinates[0][0][0]) / 2, (inTweet.place.bounding_box.coordinates[0][2][1] + inTweet.place.bounding_box.coordinates[0][0][1]) / 2))
insertRecord(tweetTuple, name)
numberIncreaser = 1
if locationType == "user location":
if inTweet.coordinates != None:
#places are displayed with bounding boxes:
tweetTuple = (inTweet.user.name, inTweet.text, inTweet.created_at.strftime('%Y-%m-%d %H:%M'), "device coordinates", str(inTweet.id),(inTweet.coordinates['coordinates'][0], inTweet.coordinates['coordinates'][1]))
insertRecord(tweetTuple, name)
numberIncreaser = 1
return numberIncreaser
The numberIncreaser will tell our main function whether the tweet was a desired tweet according to our search parameters and will also insert the tweet in our feature class.
We will use the same functions when searching for real time tweets. Yet the logic is a bit different:
if parameters[4].value == "real time":
arcpy.AddMessage("start: collecting real time tweets")
start_time = time.time() #start time
class stream2lib(tweepy.StreamListener):
def __init__(self, api=None):
#api = tweepy.API(key)
self.api = api
self.n = 0
def on_status(self, status):
if status.geo != None and parameters[3].value == 'user location':
self.n = self.n+1
arcpy.AddMessage(str(self.n) + " tweets received...")
arcpy.AddMessage(str(time.time() - start_time) + "s from " + str(parameters[6].value) + "s")
accessTweet(status, parameters[3].value, self.n, name)
if status.place != None and parameters[3].value == 'place location':
self.n = self.n+1
arcpy.AddMessage(str(self.n) + " tweets received...")
arcpy.AddMessage(str(time.time() - start_time) + "s from " + str(parameters[6].value) + "s")
#arcpy.AddMessage(status)
accessTweet(status, parameters[3].value, self.n, name)
if self.n >= parameters[5].value:
arcpy.AddMessage("Desired number of tweets collected!")
return False
if (time.time() - start_time) >= parameters[6].value:
arcpy.AddMessage("Time limit of " + str(parameters[6].value) + "s reached!" )
return False
if self.n < parameters[5].value:
return True
stream = tweepy.streaming.Stream(key, stream2lib())
if parameters[2].value:
stream.filter(locations=[parameters[2].value.XMin,parameters[2].value.YMin,parameters[2].value.XMax,parameters[2].value.YMax])
else:
stream.filter(track=[parameters[0].value])
In the end you can gather tweets for your very own purpose but keep in mind: around 0.1% have some proper location information and the Tweepy API does not provide access to the firehose but to 1% of the full Twitter stream. Also keep in mind that you might collect also bots with the same location for every tweet (two radio stations are tweeting their playlist in Berlin ).
But you might get some really cool visuals:
If you want to work with the toolbox grab it from git. You can also work with the static version attached here.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.