Why would my additional MakeTableView_management() calls take drastically longer?

Karyn_Kerdolff · ‎01-25-2017

I'm working on a Python script that involves selecting parcels from a feature class and I've run into some odd behavior in my sanity checks. In the code below I loop through the list of specified tax ID numbers and use the MakeTableView trick to get a count of elements matching a specific "where" clause. If I only give it one tax ID, the script finishes in a couple of seconds. If I give it two, it gets stuck on MakeTableView for about 4 minutes, and this seems to increase by about 4 minutes for each additional tax ID. I've tried explicitly deleting the resulting table view as soon as I'm finished with it, but that made no difference.

Ultimately, I changed the script to use a search cursor and the "where" clause to give me the same functionality, but this is just bouncing around the back of my head—why would the subsequent MakeTableView calls perform so much slower than the first?

Edit for clarity: TID is a list of tax ID's obtained by splitting a multi-value GetParameterAsText().

# Add all desired parcels to selection
for tid in TID:
   if tid and tid != "#":
      # Make sure the parcel ID is formatted correctly
      if not re.match(pattern, tid):
         raise GPException("Input Parcel IDs must be in the format " +
                  "YY-YYY-YYYY, where Y is a single digit number." +
                  " For example, 06-019-0009.")
      # Make sure parcel ID is a valid parcel
      where = TID_field + " = '" + tid + "'"
      arcpy.MakeTableView_management(parcels, "in_memory/temp", where)
      if arcpy.GetCount_management("in_memory/temp") < 1:
         raise GPException("Cannot find parcel ID " + tid + " in parcel " +
                 "list.")
      arcpy.Delete_management("in_memory/temp")    
      # Finally, add it to the selection if it's passed all the checks
      arcpy.SelectLayerByAttribute_management(parcels, "ADD_TO_SELECTION", where)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

DanPatterson_Retired · ‎01-25-2017

everything has to cycle through your first loop, hence your incremental time creep as you add more to the quest. Perhaps your indentation is completely off but I see no reason to cycle through every feature when you really only want to cycle through the features that match your query

Karyn_Kerdolff · ‎01-25-2017

Dan, TID is a list of input tax IDs that comes from a GetParamterAsText() call on a multi-value parameter. If I am not misunderstanding anything, this means there's only one loop and its number of executions is equal to the number of items in that list. I've confirmed this by debugging it and stepping through each line of code. Everything works as I'd expect, but the program stops at line 11 (MakeTableManagement) for several minutes the second (third, fourth, etc) time through the loop but not the first.

(edit: line number)

DanPatterson_Retired · ‎01-25-2017

so is line 14 indented wrong then? it seems you create the in-memory table, then try to delete the workspace there

Karyn_Kerdolff · ‎01-25-2017

My browser is showing line 14 as the continuation of the exception's message, so I assume we're talking about the Delete_management() line. I don't think it's indented wrong—I am trying to delete the table view regardless of the outcome of the "if GetCount..." as an attempt to solve the problem under the assumption that trying to create a table view with the same path as an existing table view could be causing issues. However, including or removing this line hasn't changed the behavior.

JoshuaBixby · ‎01-25-2017

I have run into issues with certain geoprocessing tools getting bogged down like what you describe here when looping. I think it is tied to some memory management or garbage collection issue at the interface between the tool and ArcPy. Try taking lines 11-13 and wrapping them in a stand-alone function, and then call that function from within your loop. See if that does away with the incremental time creep you are seeing.

KimOllivier · ‎01-27-2017

I never use an arcpy tool in a loop. They are not designed to do that.

You could assemble your list of FIDS into a list, convert to a string and use a SQL "IN" statement to select in one step.

Faster, easier to understand, easier to debug and easier to create.

Why do you need to check if the parcel id exists? If it doesn't it won't matter in an IN list.

# list of valid ids (do a regular expression in a list comprehension to validate
print(len(FIDS)) # just to see how many, limit 50,000 should be fine, if more then partition, can use 1M if integers not strings
the_list = str(tuple(FIDS)).replace('u','') # strip off any unicode tags
sql = "{} in {}".format(the_field,the_list)
print (sql[0:40], sql[-40:]) # check syntax
# make sure FID is indexed for performance
arcpy.SelectLayerByAttribute_management(parcels, "ADD_TO_SELECTION", sql)

Karyn_Kerdolff · ‎01-27-2017

That is definitely a more elegant solution than mine. I'm doing a check with an explicit error because eventually I want to expose my script to the public through a geoprocessing service and I want to give meaningful feedback on errors.

BlakeTerhune · ‎01-27-2017

I think this is the direction I would go too. It seems unnecessary to have this in a loop. Here's my take.

TID = tuple(str(i) for i in TID)
where = "{} in{}".format(TID_field, TID)
arcpy.MakeTableView_management(parcels, "parcels_tblvw", where)
arcpy.SelectLayerByAttribute_management("parcels_tblvw", "NEW_SELECTION")‍‍‍‍‍‍‍‍‍‍‍‍

Karyn_Kerdolff · ‎01-27-2017

Thanks for the suggestions, everyone. It definitely looks like it doesn't like being in a loop. Extracting it to a function and calling it in a loop results in longer run times (in this particular case), so there's something going on there that seems to confirm the warnings about loops.