Performance for Append vs Insert Cursor

5307
21
Jump to solution
08-09-2013 11:09 AM
AlanToms
Occasional Contributor
Ok I have a feature class (260k records) and 3 tables.  I need to get specific information (about 20 fields) from each into a new feature class.  I have several different ways to do this and I???m curious if anyone has done any performance testing and  knows which of my options would be ???best???.

Option 1
I could create several joins and append the data into the new feature class, with some additional field calculations for cleanup.

Option 2
Create a python script that would loop through each record, search cursor the other tables for data, then insert cursor into the new feature class.


Thank you
Alan
0 Kudos
1 Solution

Accepted Solutions
ChrisSnyder
Regular Contributor III
I'll go with the append.


Don't!

Going the Python route (reading the join tables(s) into a dictionary using a search cursor and then updating the main table via an update cursor is by far the fastest method. This is true in v10.0 and below, but is especially true in v10.1+ using ethe data access cursors. In addition to faster processing, this method is far more flexible in that allows for all sorts of error handeling and whatnot through conditional expressions.

For example, say you want to get the fields "ADDRESS" and "CITY" into the main table (key field being "NAME"):

#UNTESTED! lutTbl = r"C:\temp\test.gdb\lookuptable" mainTbl = "r"C:\temp\test.gdb\maintable" lutDict= dict([(r[0], (r[1], r[2])) for r in arcpy.da.SearchCursor(lutTbl, ["NAME","ADDRESS","CITY")]) arcpy.AddField_managment(mainTbl, "ADDRESS", "TEXT", "", "", "75") arcpy.AddField_managment(mainTbl, "CITY", "TEXT", "", "", "30") updateRows = arcpy.da.UpdateCursor(mainTbl, ["NAME","ADDRESS","CITY"]) for updateRow in updateRows:     nameValue = updateRow[0]      if nameValue in lutDict:         updateRow[1] = lutDict[nameValue][0]  #Address         updateRow[2] = lutDict[nameValue][1]  #City     else:        print "Could not locate address/city info for " + str(nameValue)      updateRows.updateRow(updateRow) del updateRow, updateRows

View solution in original post

0 Kudos
21 Replies
RichardFairhurst
MVP Honored Contributor
Ok I have a feature class (260k records) and 3 tables.  I need to get specific information (about 20 fields) from each into a new feature class.  I have several different ways to do this and I�??m curious if anyone has done any performance testing and  knows which of my options would be �??best�?�.

Option 1
I could create several joins and append the data into the new feature class, with some additional field calculations for cleanup.

Option 2
Create a python script that would loop through each record, search cursor the other tables for data, then insert cursor into the new feature class.


Thank you
Alan


Assuming the join fields are indexed on all of the tables, option 1 blew away cursors under the pre-da version cursors.  Have not tested the da version cursors.  However, even ArcObjects cursors perform slower than joins, so I doubt Option 2 would work faster even with a da cursor.  Hitting tables with repeated queries is much slower than a join, which does just one correlation of the tables, because each new query has to reset the table read, as far as I know.  Main reason to not use joins would be if there was a 1:M or M:M relationship to traverse.
0 Kudos
AlanToms
Occasional Contributor
Assuming the join fields are indexed on all of the tables, option 1 blew away cursors under the pre-da version cursors.  Have not tested the da version cursors.  However, even ArcObjects cursors perform slower than joins, so I doubt Option 2 would work faster even with a da cursor.  Hitting tables with repeated queries is much slower than a join, which does just one correlation of the tables, because each new query has to reset the table read, as far as I know.  Main reason to not use joins would be if there was a 1:M or M:M relationship to traverse.


That's kind of what I figured; just needed some one else to back me up. 😄  I'll go with the append.

Thank you
Alan
0 Kudos
RichardFairhurst
MVP Honored Contributor
If you were to try the Python route at some point, most posts in the Python forum read the related table and place its data in a dictionary to do the match up between tables.  THe in memory dictionary will perform much better than repeated queries with a cursor.  I suspect the Join must use some process through memory as well, since in a field calculation it initially hesitates, but then blasts through many join calculations, especially when the records to transfer can easily be read into memory.
0 Kudos
ChrisSnyder
Regular Contributor III
I'll go with the append.


Don't!

Going the Python route (reading the join tables(s) into a dictionary using a search cursor and then updating the main table via an update cursor is by far the fastest method. This is true in v10.0 and below, but is especially true in v10.1+ using ethe data access cursors. In addition to faster processing, this method is far more flexible in that allows for all sorts of error handeling and whatnot through conditional expressions.

For example, say you want to get the fields "ADDRESS" and "CITY" into the main table (key field being "NAME"):

#UNTESTED! lutTbl = r"C:\temp\test.gdb\lookuptable" mainTbl = "r"C:\temp\test.gdb\maintable" lutDict= dict([(r[0], (r[1], r[2])) for r in arcpy.da.SearchCursor(lutTbl, ["NAME","ADDRESS","CITY")]) arcpy.AddField_managment(mainTbl, "ADDRESS", "TEXT", "", "", "75") arcpy.AddField_managment(mainTbl, "CITY", "TEXT", "", "", "30") updateRows = arcpy.da.UpdateCursor(mainTbl, ["NAME","ADDRESS","CITY"]) for updateRow in updateRows:     nameValue = updateRow[0]      if nameValue in lutDict:         updateRow[1] = lutDict[nameValue][0]  #Address         updateRow[2] = lutDict[nameValue][1]  #City     else:        print "Could not locate address/city info for " + str(nameValue)      updateRows.updateRow(updateRow) del updateRow, updateRows

View solution in original post

0 Kudos
RichardFairhurst
MVP Honored Contributor
I don't use python if there is a geoprocessing way of doing things.  Debug time is too much with Python for me.  Keeps my life simpler and I would bet Append is faster.
0 Kudos
ChrisSnyder
Regular Contributor III
I'm a bit confused if this topic is:

A. Attempting to join fields to a main table from a look up table(s).
B. Attempting to append records to a main table from some other table(s).

I think it's A, but if it is B, I will add this: http://forums.arcgis.com/threads/66434-A-better-way-to-run-large-Append-Merge-jobs
0 Kudos
AlanToms
Occasional Contributor
I'll go with the append.


Don't!

Going the Python route (reading the join tables(s) into a dictionary using a search cursor and then updating the main table via an update cursor is by far the fastest method. This is true in v10.0 and below, but is especially true in v10.1+ using ethe data access cursors. In addition to faster processing, this method is far more flexible in that allows for all sorts of error handeling and whatnot through conditional expressions.

For example, say you want to get the fields "ADDRESS" and "CITY" into the main table (key field being "NAME"):

#UNTESTED!
lutTbl = r"C:\temp\test.gdb\lookuptable"
mainTbl = "r"C:\temp\test.gdb\maintable"
lutDict= dict([(r[0], (r[1], r[2])) for r in arcpy.da.SearchCursor(lutTbl, ["NAME","ADDRESS","CITY")])
arcpy.AddField_managment(mainTbl, "ADDRESS", "TEXT", "", "", "75")
arcpy.AddField_managment(mainTbl, "CITY", "TEXT", "", "", "30")
updateRows = arcpy.da.UpdateCursor(mainTbl, ["NAME","ADDRESS","CITY"])
for updateRow in updateRows:
    nameValue = updateRow[0]
    if nameValue in lutDict:
        updateRow[1] = lutDict[nameValue][1]  #Address
        updateRow[2] = lutDict[nameValue][2]  #City
    else:
       print "Could not locate address/city info for " + str(nameValue)
    updateRows.updateRow(updateRow)
del updateRow, updateRows


I went ahead and went down this path as it will work better for me in the long run.  I have to figure out how to "easily" include updates from additional tables. 

The only hiccup was something with the index.  I'm novice with python so it may be something I should know.
Instead of
updateRow[1] = lutDict[nameValue][1]
I had to use
row[1] = PADataDict[PKValue][0]
  It was almost like once it got into the if statement the index 0 was the first value, not the key. 

Thank you for the help.
Alan
ChrisSnyder
Regular Contributor III
Woops - good catch. My code should be:

updateRow[1] = lutDict[nameValue][0]
updateRow[2] = lutDict[nameValue][1]

I fixed it in my original above.

Anyway - dictionaries are great. I do all my fancy SQL-like stuff in them now... especially now with 64-bit arcpy. Data tables have kazillions records? No problem and crazy fast!!! No need for fancy RDBMs...
0 Kudos
CalebMackey1
Regular Contributor II

Anyway - dictionaries are great. I do all my fancy SQL-like stuff in them now... especially now with 64-bit arcpy. Data tables have kazillions records? No problem and crazy fast!!! No need for fancy RDBMs...


I agree 100%.  The combination of dictionaries, 64 bit python, and da cursors is ridiculously fast! 

I used to have to manually go into the Assessor's software to run SQL queries to export tables to dbf to tie their records to the county parcel data for our web services...I have since scratched that process and now hack in through an odbc connection, execute the SQL into a dictionary, overwrite the existing dbf table, and then fill in the fields with a da.insert cursor. Now executes as a scheduled task in seconds (used to take a few minutes manually).   No one likes manual labor anyways, right? :cool:
0 Kudos