Is the Data Access module faster at counting features than MakeTableView + GetCount

MichaelMarkieta · ‎07-13-2012

Given the functions in my python application, I would rather use the new data access module and Search Cursors to access my data and iterate through an object.
Planning for the future, which method would produce quicker results when scaled upwards (more features)?

     with arcpy.da.SearchCursor("myFeature", ["OBJECTID"]) as cursor:     rows = sorted({row[0] for row in cursor})  count = 0 for row in rows:     count += 1

or

     arcpy.MakeTableView_management("myFeature", "myTableView") count = int(arcpy.GetCount_management("myTableView").getOutput(0))

Cheers

MichaelMarkieta · ‎07-23-2012

For those interested,

here are my results with a feature class that contains 1 million point features generated randomly in ArcMap:

"""Method 1""" import time import arcpy  arcpy.env.workspace = "C:\CountTest.gdb"  StartTime = time.clock()  # Grab the time after importing the arcpy module (heavy), and setting workspace.  with arcpy.da.SearchCursor("RandomPoints", ["OBJECTID"]) as cursor:     rows = {row[0] for row in cursor}  count = 0 for row in rows:     count += 1  EndTime = time.clock()  print "Finished in %s seconds" % (EndTime - StartTime) print count

>>>  Finished in 6.75371938368 seconds 1000000 features >>> =============================== RESTART =============================== >>>  Finished in 6.8498145457 seconds 1000000 features >>> =============================== RESTART =============================== >>>  Finished in 6.8776609853 seconds 1000000 features >>>

"""Method 2""" import time import arcpy  arcpy.env.workspace = "C:\CountTest.gdb"  StartTime = time.clock() # Grab the time after importing the arcpy module (heavy), and setting workspace.  arcpy.MakeTableView_management("RandomPoints", "myTableView") count = int(arcpy.GetCount_management("myTableView").getOutput(0))  EndTime = time.clock()  print "Finished in %s seconds" % (EndTime - StartTime) print "%s features" % count

>>>  Finished in 1.68345616753 seconds 1000000 features >>> =============================== RESTART =============================== >>>  Finished in 1.64100628447 seconds 1000000 features >>> =============================== RESTART =============================== >>>  Finished in 1.65225749949 seconds 1000000 features >>>

View solution in original post

MichaelMarkieta · ‎07-23-2012

For those interested,

here are my results with a feature class that contains 1 million point features generated randomly in ArcMap:

"""Method 1""" import time import arcpy  arcpy.env.workspace = "C:\CountTest.gdb"  StartTime = time.clock()  # Grab the time after importing the arcpy module (heavy), and setting workspace.  with arcpy.da.SearchCursor("RandomPoints", ["OBJECTID"]) as cursor:     rows = {row[0] for row in cursor}  count = 0 for row in rows:     count += 1  EndTime = time.clock()  print "Finished in %s seconds" % (EndTime - StartTime) print count

>>>  Finished in 6.75371938368 seconds 1000000 features >>> =============================== RESTART =============================== >>>  Finished in 6.8498145457 seconds 1000000 features >>> =============================== RESTART =============================== >>>  Finished in 6.8776609853 seconds 1000000 features >>>

"""Method 2""" import time import arcpy  arcpy.env.workspace = "C:\CountTest.gdb"  StartTime = time.clock() # Grab the time after importing the arcpy module (heavy), and setting workspace.  arcpy.MakeTableView_management("RandomPoints", "myTableView") count = int(arcpy.GetCount_management("myTableView").getOutput(0))  EndTime = time.clock()  print "Finished in %s seconds" % (EndTime - StartTime) print "%s features" % count

>>>  Finished in 1.68345616753 seconds 1000000 features >>> =============================== RESTART =============================== >>>  Finished in 1.64100628447 seconds 1000000 features >>> =============================== RESTART =============================== >>>  Finished in 1.65225749949 seconds 1000000 features >>>

ChristopherThompson · ‎07-24-2012

I would rather use the new data access module and Search Cursors to access my data and iterate through an object

Given your stated desire, i'm not sure how the information you gained from your experiment (though interesting to see) is going to help you. Given that the purpose of a cursor is to move through your data a row at a time, the factors impacting processing speed are going to be more about what other actions are being executed while you're pointed at each row. Adding an accumulator function (count = count +1) to a series of other actions is unlikely to significantly impact processing time. I guess if all you need to do is count the features, then using the MakeTableView and implementing GetCount against that is the right way to go.

LorneDmitruk · ‎07-25-2012

I would do this to avoid having to loop through the records

with arcpy.da.SearchCursor("myFeature", ["OBJECTID"]) as cursor:
    rows = sorted({row[0] for row in cursor})

count = len(rows)

Cheers!

Given the functions in my python application, I would rather use the new data access module and Search Cursors to access my data and iterate through an object.
Planning for the future, which method would produce quicker results when scaled upwards (more features)?
    
with arcpy.da.SearchCursor("myFeature", ["OBJECTID"]) as cursor:
    rows = sorted({row[0] for row in cursor})

count = 0
for row in rows:
    count += 1
or
    
arcpy.MakeTableView_management("myFeature", "myTableView")
count = int(arcpy.GetCount_management("myTableView").getOutput(0))
Cheers