counting search cursor results

MikeTischler · ‎09-03-2010

Hi,
I'm using the arcpy.SearchCursor code to query a standalone table. All I really need is the count of the results - not any of the actual data. The searchcursor itself finishes rather quickly, but iterating through the search cursor takes a surprisingly long time.

Here's the method I'm using. Is there a faster way to get what I need?

 rows = arcpy.SearchCursor(table,searchstring)

                #count the number of rows in output   
                count = 0
               
                for row in rows:
                    count += 1

KimOllivier · ‎09-10-2010

Surely you don't need to iterate the split list?
It is already a list after a split

print len(fidSet.split(";"))

just being economical.

MikeTischler · ‎09-10-2010

Chris,
Thanks for the tips.

Re: suggestion 1, I don't think I can use SelectLayerByAttribute because I have a standalone table.

Re: suggestion 2, I thought this might be the winner, since I do have a series of searches to run over the same table. However, the overhead in creating the 2.3mil key:value pairs was REALLY expensive. I let my script run for about 6-7 minutes, and the dictionary hadn't yet been created...let alone complete any searches.

I thought I might be able to leverage some functionality from numpy, since all my values are numeric. Turns out there is a pretty cool way to load an array (my_array = numpy.fromiter) but that, too, was quite expensive.

For comparison sake, I used a smaller table with ~160K records. For each of the trials, I started a fresh ArcMap session. I'm not sure if the individual methods are coded the best, but here's the times and code to run 10 searches over the same table:

by cursor:14.17s

by making table and counting: 28.4s

by dictionary: 98.5s

from arcpy import *
from stopwatch import clockit
from numpy import *

@clockit
def do_search_by_count(table):
    count = 0
    for i in range(100,1000,100):
        searchstring = "NEAR_DIST < " + str(i)
        count += make_table_and_count(table, searchstring)
    print "done by count: " + str(count)
    return count

@clockit
def do_search_by_cursor(table):
    count = 0
    for i in range(100,1000,100):
        searchstring = "NEAR_DIST < " + str(i)
        count += cursor_search_count(table,searchstring)    
    print "done by cursor: " + str(count)
    return count

@clockit
def do_search_by_dict(table):

    lookupDict = create_dict(table)    
    count = 0    
    for i in range(100,1000,100):    
        count += count_dict(lookupDict,i)
    print "done with dict: " + str(count)
    return count

@clockit
def make_table_and_count(table,searchstring):        
        arcpy.MakeTableView_management(table,"mytv5k",searchstring)
        x = int(str(arcpy.GetCount_management("mytv5k")))
        return x
    
@clockit
def cursor_search_count(table,searchstring):
    rows = arcpy.SearchCursor(table,searchstring)
    count = 0
    for row in rows:
        count += 1
    return count
    
@clockit
def create_dict(table):
    lookupDict = {}
    searchRows = arcpy.SearchCursor(table)
    for myrow in searchRows:
       lookupDict[myrow.OBJECTID] = [myrow.NEAR_DIST]
      
    del myrow
    del searchRows
    print "created dict"
    return lookupDict

@clockit
def count_dict(lookupDict,dist):
    count =0
    for k,v in lookupDict.iteritems():
        if v[0] < dist:
            count += 1
    return count

Thanks again for the tips.

Mike

ChrisSnyder · ‎09-10-2010

It is already a list after a split

Yes, typing faster than I was thinking...

MikeTischler · ‎09-10-2010

And if anyone's interested, the numpy-based method clocked in at 62.6s

@clockit
def do_search_by_numpy(table):
    my_array = numpy.fromiter(countrows(table),dtype=numpy.dtype(float))
    count = 0    
    for i in range(100,1000,100):    
        count += numpy.size(where(my_array < i))
    print "done with dict: " + str(count)
    return count

@clockit
def countrows(table):
    searchRows = arcpy.SearchCursor(table)
    for myrow in searchRows:
        yield myrow.NEAR_DIST

ChrisSnyder · ‎09-10-2010

suggestion 1, I don't think I can use SelectLayerByAttribute because I have a standalone table

In fact you can use a table with SelectLayerByAttribute, but use the MakeTableView tool 1st (in lieu of MakeFeatureLayer), and use the tableview as input.

Re: suggestion 2, I thought this might be the winner, since I do have a series of searches to run over the same table.

The dictionary idea is probably only a good idea if you have to search the table a whole lot (like hundreds of times - as noted in another thread, it can be a good replacement for an embedded cursor, or some sort of tracing algorithm), because you right, there is a lot of overhead, but maybe something basic like this would work:

#v9.3 code BTW
dict = {}
searchRows = gp.searchcursor(tblPath)
searchRow = searchRows.next()
while searchRow:
   dict[searchRow.MY_KEY] = [searchRow.FIELD1,searchRow.FIELD2] #this could be a tuple instead of a list to save memory if you didn't plan on changing the values... 
   searchRow = searchRows.next()
del searchRow
del searchRows

#yes, there's probably a faster/more elegant way to do this... Kim? .iteritems or .itervalues any faster?
rowCounter1 = 0
rowCounter2 = 0
for myKey in dict:
   if dict[myKey][0] == 100 or dict[myKey][1] == "dog":
      rowCounter1 = rowCounter1 + 1
   if dict[myKey][0] == 200 or dict[myKey][1] == "cat":
      rowCounter2 = rowCounter2 + 1   
print rowCounter1
print rowCounter2

ChrisSnyder · ‎09-10-2010

Just playing around with it, but this code (1.5 million keys) took less than 1 second to execute on my machine:

dict = {}
for item in range(1,1500000):
   dict[item] = random.uniform(1,10) #return a random float
count = 0
for item in dict:
   if dict[item] > 5:
      count = count + 1
print count

>>> 833384

MikeTischler · ‎09-10-2010

huh

I tried your code in the last reply in the python window inside arcmap, and it took 46s and change.

From a pythonwin command line, it took just over 1s.

hmmm...

ChrisSnyder · ‎09-10-2010

Hmm indeed.

I'm pretty much PythonWin all the way... That new-fangled PythonWindow is strange and scary to me. I still can't figure out how to execute simple Spatial Analyst commands though the blasted thing - the help seems totally contradictory! Good ole' SingleOutpuMapAlgebra tool.... look what they have done to you!

MikeTischler · ‎09-10-2010

Wow - so running the same scripts in a pythonwin window yielded the following results.

search by making table and getcount: 4.3s
search by iterating cursor: 6.5s
search by dictionary: 16.18
search by numpy: 12.54

obviously, I need to get out of the python window in ArcGIS 10.

ChrisSnyder · ‎09-10-2010

obviously, I need to get out of the python window in ArcGIS 10

ESRI/Anyone know why there is a performance issue here? I thought that gp-type code (like cursors, ESRI tools, etc.) was supposed to run faster in the Toolbox and PythonWindow environments compared to extrenal IDEs like PythonWin and Wing... What happened to that "in process" thing?