Pandas: overflowerror maximum recursion level reached

3939
4
Jump to solution
01-13-2020 10:52 AM
JamesCrandall
MVP Frequent Contributor
This may not be wholly an ArcGIS problem but I always seem to get the best answers from this group! 
I'm working with the new ESRI Tracker feature service and trying to summarize the json output a bit to generate reports.  The specific task is to maintain a running total of seconds between successive point features using the "location_timestamp" column.  I won't go into the query portion where we are acquiring the json from the feature service (it's all basic/simple stuff just querying the REST interface with urllib2.Request). 
I'll put 2 versions (short & long version has all details, just read everything after the *****)

The short version -- grouping to sum an "elapsedSeconds" column:
grouped = df.groupby(['location_day','PEP_land_name','PEP_land_rate','created_user'])['elapsedSeconds'].sum().reset_index()
Revert this grouped dataframe back into json:
dfjson = grouped.to_json(orient='records')
This is where I end up with an "overflowerror maximum recursion level reached".
************

The long version with more detail about what I'm doing.  To start, I'm just querying the REST of a hosted feature service (the ESRI tracker service):
tracksReq = urllib2.Request(urlTrackerMain + '/query', tracksParams)
tracksResponse = urllib2.urlopen(tracksReq)
tracksResult = json.load(tracksResponse)
And whammo, I have the json output of the tracker service. From here I add some columns to the json for various things:
if tracksResult is not None:
     output = []
for jj in tracksResult['features']:
	int_num = int(str(json.dumps(jj['attributes']['location_timestamp'])))
	utc = time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(int_num/1000.0))
	convTimeStamp = time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(int_num/1000.0))
	convDayStamp = time.strftime('%Y-%m-%d', time.gmtime(int_num/1000.0))
	jj['attributes']['location_timestamp_local'] = convTimeStamp
	jj['attributes']['location_day'] = convDayStamp

	output.append(jj['attributes'])
So now I have a new output with some extra columns (converted those epoch datetime values to something meaningful).  Now from here I am ready to do some grouping/summarizing using Pandas:
df = pd.DataFrame.from_dict(output, orient='columns')
df['location_timestamp_local']= pd.to_datetime(df['location_timestamp_local'])
df = df.sort('location_timestamp_local')
Here's where I am creating that running total column in the dataframe:
#determine duration in seconds between the previous row's datetime value. Do this for each day.
df['elapsedSeconds'] = df.sort('location_timestamp_local').groupby(['location_day','PEP_land_name'])['location_timestamp_local'].diff()/1000
All is good and well.  Now for some grouping to sum that "elapsedSeconds" value.
grouped = df.groupby(['location_day','PEP_land_name','PEP_land_rate','created_user'])['elapsedSeconds'].sum().reset_index()
And finally revert this grouped dataframe back into json:
dfjson = grouped.to_json(orient='records') 
This is where I end up with an "overflowerror maximum recursion level reached".  It doesn't seem to error when I run in pyScripter v2.6 x86 but it fails when I hook up this .py script to a Geoprocessing tool source.  This is likely the issue (32-bit vs. 64?), I'm unsure and really just looking for workarounds if known.
Anyway -- thanks for looking and should be a fun one to figure out!
ArcGIS 10.4
Pandas 0.16.1
Tags (2)
0 Kudos
1 Solution

Accepted Solutions
JamesCrandall
MVP Frequent Contributor

Sort of a hack workaround I guess (no problem taking that criticism) but if I just convert the grouped result to a csv then back to a json output via each line as json.dumps(), I get the result I'm after. 

Just added this simple def() to create the csv into the scratchFolder then return the json:

def generateOutput(df):

    outputT = 'SessionFile_{}.{}'.format(str(uuid.uuid1()), "csv")
    Output_File = os.path.join(arcpy.env.scratchFolder, outputT)

    df.to_csv(Output_File, index=False)
    fieldnames = ('location_day','PEP_land_name','PEP_land_rate','created_user','elapsedSeconds')
    csvfile = open(Output_File, 'r')
    reader = csv.DictReader( csvfile)
    outJson = json.dumps( [ row for row in reader ] )

    csvfile.close()
    arcpy.Delete_management(Output_File)

    return outJson


grouped = df.groupby(flds)['elapsedSeconds'].sum().reset_index()
grouped['elapsedSeconds'] = ((grouped['elapsedSeconds'] / np.timedelta64(1, 's')) *1000).astype(str)
dfjson = generateOutput(grouped)
arcpy.AddMessage(dfjson)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

View solution in original post

0 Kudos
4 Replies
JamesCrandall
MVP Frequent Contributor
EDIT (may help to answer):
Just trying a different tact I exported the grouped result to a .csv file to see what the contents were.  It looks like that "elapsedSeconds" column is in a timedelta? 
I'm not sure but perhaps this is the cause of my error, but I'm still unsure what to do about it:
PEP_land_name,PEP_land_rate,elapsedSeconds
Campus 1,RP,0 days 00:00:00.022000000
Campus 2,PRE,0 days 00:00:00.009000000
Campus B2 Entrance East 1,PRE,0 days 00:00:00.013000000
Campus Parking Lot NE 1,RP,0 days 00:00:00.040000000
Campus Parking Lot NE 2,PRE,0 days 00:00:00.039000000
0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

Is the issue...

    python 3.6.9, pandas 0.25.1  (ArcGIS Pro)  vs python 2.7.x, pandas 0.16.1  (ArcMap 10.4)

Or is this only a python 2.7.x 32 bit vs 64 bit issue?

Because there are 5 years of pandas changes, not to mention numpy and python changes if it is the former

Release Notes — pandas 0.25.3 documentation 

JamesCrandall
MVP Frequent Contributor

Thanks Dan.  Probably something to do with it.  I just dev off the resources I'm provided but will have to check with sys engineering to determine what's what I guess.  

0 Kudos
JamesCrandall
MVP Frequent Contributor

Sort of a hack workaround I guess (no problem taking that criticism) but if I just convert the grouped result to a csv then back to a json output via each line as json.dumps(), I get the result I'm after. 

Just added this simple def() to create the csv into the scratchFolder then return the json:

def generateOutput(df):

    outputT = 'SessionFile_{}.{}'.format(str(uuid.uuid1()), "csv")
    Output_File = os.path.join(arcpy.env.scratchFolder, outputT)

    df.to_csv(Output_File, index=False)
    fieldnames = ('location_day','PEP_land_name','PEP_land_rate','created_user','elapsedSeconds')
    csvfile = open(Output_File, 'r')
    reader = csv.DictReader( csvfile)
    outJson = json.dumps( [ row for row in reader ] )

    csvfile.close()
    arcpy.Delete_management(Output_File)

    return outJson


grouped = df.groupby(flds)['elapsedSeconds'].sum().reset_index()
grouped['elapsedSeconds'] = ((grouped['elapsedSeconds'] / np.timedelta64(1, 's')) *1000).astype(str)
dfjson = generateOutput(grouped)
arcpy.AddMessage(dfjson)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍
0 Kudos