Pandas: overflowerror maximum recursion level reached

JamesCrandall · ‎01-13-2020

This may not be wholly an ArcGIS problem but I always seem to get the best answers from this group!

I'm working with the new ESRI Tracker feature service and trying to summarize the json output a bit to generate reports. The specific task is to maintain a running total of seconds between successive point features using the "location_timestamp" column. I won't go into the query portion where we are acquiring the json from the feature service (it's all basic/simple stuff just querying the REST interface with urllib2.Request).

I'll put 2 versions (short & long version has all details, just read everything after the *****)

The short version -- grouping to sum an "elapsedSeconds" column:

grouped = df.groupby(['location_day','PEP_land_name','PEP_land_rate','created_user'])['elapsedSeconds'].sum().reset_index()‍

Revert this grouped dataframe back into json:

dfjson = grouped.to_json(orient='records')‍

This is where I end up with an "overflowerror maximum recursion level reached".

************

The long version with more detail about what I'm doing. To start, I'm just querying the REST of a hosted feature service (the ESRI tracker service):

tracksReq = urllib2.Request(urlTrackerMain + '/query', tracksParams)
tracksResponse = urllib2.urlopen(tracksReq)
tracksResult = json.load(tracksResponse)‍‍‍

And whammo, I have the json output of the tracker service. From here I add some columns to the json for various things:

if tracksResult is not None:
     output = []
for jj in tracksResult['features']:
	int_num = int(str(json.dumps(jj['attributes']['location_timestamp'])))
	utc = time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(int_num/1000.0))
	convTimeStamp = time.strftime('%Y-%m-%d %H:%M:%S', time.gmtime(int_num/1000.0))
	convDayStamp = time.strftime('%Y-%m-%d', time.gmtime(int_num/1000.0))
	jj['attributes']['location_timestamp_local'] = convTimeStamp
	jj['attributes']['location_day'] = convDayStamp

	output.append(jj['attributes'])‍‍‍‍‍‍‍‍‍‍‍

So now I have a new output with some extra columns (converted those epoch datetime values to something meaningful). Now from here I am ready to do some grouping/summarizing using Pandas:

df = pd.DataFrame.from_dict(output, orient='columns')
df['location_timestamp_local']= pd.to_datetime(df['location_timestamp_local'])
df = df.sort('location_timestamp_local')‍‍‍

Here's where I am creating that running total column in the dataframe:

#determine duration in seconds between the previous row's datetime value. Do this for each day.
df['elapsedSeconds'] = df.sort('location_timestamp_local').groupby(['location_day','PEP_land_name'])['location_timestamp_local'].diff()/1000‍‍

All is good and well. Now for some grouping to sum that "elapsedSeconds" value.

grouped = df.groupby(['location_day','PEP_land_name','PEP_land_rate','created_user'])['elapsedSeconds'].sum().reset_index()‍

And finally revert this grouped dataframe back into json:

dfjson = grouped.to_json(orient='records') ‍

This is where I end up with an "overflowerror maximum recursion level reached". It doesn't seem to error when I run in pyScripter v2.6 x86 but it fails when I hook up this .py script to a Geoprocessing tool source. This is likely the issue (32-bit vs. 64?), I'm unsure and really just looking for workarounds if known.

Anyway -- thanks for looking and should be a fun one to figure out!

ArcGIS 10.4
Pandas 0.16.1

JamesCrandall · ‎01-13-2020

Sort of a hack workaround I guess (no problem taking that criticism) but if I just convert the grouped result to a csv then back to a json output via each line as json.dumps(), I get the result I'm after.

Just added this simple def() to create the csv into the scratchFolder then return the json:

def generateOutput(df):

    outputT = 'SessionFile_{}.{}'.format(str(uuid.uuid1()), "csv")
    Output_File = os.path.join(arcpy.env.scratchFolder, outputT)

    df.to_csv(Output_File, index=False)
    fieldnames = ('location_day','PEP_land_name','PEP_land_rate','created_user','elapsedSeconds')
    csvfile = open(Output_File, 'r')
    reader = csv.DictReader( csvfile)
    outJson = json.dumps( [ row for row in reader ] )

    csvfile.close()
    arcpy.Delete_management(Output_File)

    return outJson


grouped = df.groupby(flds)['elapsedSeconds'].sum().reset_index()
grouped['elapsedSeconds'] = ((grouped['elapsedSeconds'] / np.timedelta64(1, 's')) *1000).astype(str)
dfjson = generateOutput(grouped)
arcpy.AddMessage(dfjson)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

View solution in original post

JamesCrandall · ‎01-13-2020

EDIT (may help to answer):

Just trying a different tact I exported the grouped result to a .csv file to see what the contents were. It looks like that "elapsedSeconds" column is in a timedelta?

I'm not sure but perhaps this is the cause of my error, but I'm still unsure what to do about it:

PEP_land_name,PEP_land_rate,elapsedSeconds
Campus 1,RP,0 days 00:00:00.022000000
Campus 2,PRE,0 days 00:00:00.009000000
Campus B2 Entrance East 1,PRE,0 days 00:00:00.013000000
Campus Parking Lot NE 1,RP,0 days 00:00:00.040000000
Campus Parking Lot NE 2,PRE,0 days 00:00:00.039000000

DanPatterson_Retired · ‎01-13-2020

Is the issue...

python 3.6.9, pandas 0.25.1 (ArcGIS Pro) vs python 2.7.x, pandas 0.16.1 (ArcMap 10.4)

Or is this only a python 2.7.x 32 bit vs 64 bit issue?

Because there are 5 years of pandas changes, not to mention numpy and python changes if it is the former

Release Notes — pandas 0.25.3 documentation

JamesCrandall · ‎01-13-2020

Thanks Dan. Probably something to do with it. I just dev off the resources I'm provided but will have to check with sys engineering to determine what's what I guess.

JamesCrandall · ‎01-13-2020

Sort of a hack workaround I guess (no problem taking that criticism) but if I just convert the grouped result to a csv then back to a json output via each line as json.dumps(), I get the result I'm after.

Just added this simple def() to create the csv into the scratchFolder then return the json:

def generateOutput(df):

    outputT = 'SessionFile_{}.{}'.format(str(uuid.uuid1()), "csv")
    Output_File = os.path.join(arcpy.env.scratchFolder, outputT)

    df.to_csv(Output_File, index=False)
    fieldnames = ('location_day','PEP_land_name','PEP_land_rate','created_user','elapsedSeconds')
    csvfile = open(Output_File, 'r')
    reader = csv.DictReader( csvfile)
    outJson = json.dumps( [ row for row in reader ] )

    csvfile.close()
    arcpy.Delete_management(Output_File)

    return outJson


grouped = df.groupby(flds)['elapsedSeconds'].sum().reset_index()
grouped['elapsedSeconds'] = ((grouped['elapsedSeconds'] / np.timedelta64(1, 's')) *1000).astype(str)
dfjson = generateOutput(grouped)
arcpy.AddMessage(dfjson)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍