This is the third in a multi-part series on ArcPy cursors; particularly, working with ArcPy cursors as iterable objects in Python. The first part in the series looks at some of the important components of iteration in Python. The second part in the series looks at iterating and looping over ArcPy Data Access cursors. The third part in the series looks at using several Python built-in and itertool functions with ArcPy Data Access cursors. The fourth part in the series will look at using generators or generator expressions to separate selection or filtering logic for code re-use. Fifth or following parts are unknown at this point.
The first two parts in this series cover iteration components of Python and iterating/looping in Python, both of which are crucial to understanding and working with iterables. Beyond manually iterating or looping over an iterable, there are numerous Python built-in functions that work with iterables and sequences. Starting back in version 2.3, an itertools module was introduced that contains "functions creating iterators for efficient looping."
The focus of this series is treating ArcPy Data Access cursors as Python iterables. Like most things in life, there is more than one way to answer a question using ArcGIS, and I want to provide some contrasting examples along with Pythonic examples. As much as I like writing idiomatic Python, there will be plenty of times when using native geoprocessing tools will outperform straight Python. Writing fast code is great, but it's a separate discussion for a different day.
There are so many built-in and itertool functions that work with iterables, I can't possibly demonstrate them all, but I will demonstrate a handful to illustrate how such functions work nicely with ArcPy Data Access cursors. For this and future parts in the series, I will leave the previous examples behind in favor of a real-world dataset that readers can download and experiment with themselves. Specifically, I will use the USA States layer package included with Esri Data & Maps and available on ArcGIS.com.
The same ArcPy Data Access SearchCursor will be used for most of the examples below:
>>> layer = r'USA States\USA States (below 1:3m)'
>>> fields = ["STATE_ABBR", "SHAPE@AREA", "SUB_REGION", "POP2010"]
>>> cursor = arcpy.da.SearchCursor(layer, fields)
>>>
One question that comes up from time to time in the forums/GeoNet is how to count the number of records in a data set or selection set using cursors:
>>> # Example 1: Get record count using ArcGIS geoprocessing tool
>>> arcpy.GetCount_management(layer)
<Result '52'>
>>>
>>> # Example 2: Get record count using variable as counter
>>> with cursor:
... i = 0
... for row in cursor:
... i = i + 1 # or i += 1
... print i
...
52
>>>
>>> # Example 3: Get record count using built-in list and len functions
>>> with cursor:
... print len(list(cursor))
...
52
>>>
>>> # Example 4: Get record count using built-in sum function
>>> with cursor:
... print sum(1 for row in cursor)
...
52
>>>
Looking over the record counting examples:
I included Example 1 because it is the highest performing approach to getting record counts of data sets and selection sets within the ArcGIS framework, but it doesn't deal with a cursor as a Python iterable, which is the focus of this series. Example 2 is functionally and syntactically correct, although I would argue it isn't the most Pythonic. Examples 3 and 4 both use Python built-in functions that treat a cursor as an iterable, but it is arguable whether Example 3 or 4 is more Pythonic because each approach has strengths and weaknesses.
Another question that comes up occasionally is how to retrieve a record from a data set based on the minimum or maximum values of one of the fields in the data set. This next set of examples will retrieve the record/row for the state with the highest population in 2010 (POP2010):
>>> # Retrieve table name for data source of layer
>>> desc = arcpy.Describe(layer)
>>> fc_name = desc.featureClass.baseName
>>>
>>> # Example 5: Get maximum population record using ArcGIS geoprocessing tools
>>> summary_table = arcpy.Statistics_analysis(layer,
... "in_memory/summary_max",
... "POP2010 MAX")
...
>>> arcpy.AddJoin_management(layer,
... "POP2010",
... summary_table,
... "MAX_POP2010",
... "KEEP_COMMON")
...
>>> joined_fields = [(field if "@" in field else ".".join([fc_name, field]))
... for field
... in fields]
>>> cursor_sql_join = arcpy.da.SearchCursor(layer, joined_fields)
>>> print next(cursor_sql_join)
(u'CA', 41.639274447708424, u'Pacific', 37253956)
>>> del cursor_sql_join
>>> arcpy.RemoveJoin_management(layer, "summary_max")
<Result 'USA States\\USA States (below 1:3m)'>
>>>
>>> # Example 6: Get maximum population record using SQL subquery
>>> sql = "POP2010 IN ((SELECT MAX(POP2010) FROM {}))".format(fc_name)
>>> cursor_sql_subqry = arcpy.da.SearchCursor(layer, fields, sql)
>>> print next(cursor_sql_subqry)
(u'CA', 41.639274447708424, u'Pacific', 37253956)
>>> del cursor_sql_subqry
>>>
>>> # Example 7: Get maximum population record by looping and comparing
>>> with cursor:
... max_row = next(cursor)
... for row in cursor:
... if row[3] > max_row[3]:
... max_row = row
...
>>> print max_row
(u'CA', 41.639274447708424, u'Pacific', 37253956)
>>>
>>> # Example 8: Get maximum population record using built-in max function
>>> from operator import itemgetter
>>> with cursor:
... print max(cursor, key=itemgetter(3))
...
(u'CA', 41.639274447708424, u'Pacific', 37253956)
>>>
Looking over the maximum record examples:
I included Example 5 because it only uses ArcGIS geoprocessing tools and can be implemented in the GUI. Although it can be implemented in the GUI with no scripting skills, Example 5 is also the most cumbersome, i.e., it has the most steps, is the slowest, and creates intermediate products. With just a little bit of SQL or Python knowledge, the doors open to more eloquent and higher performing approaches. Similar to Example 2 above, Example 7 is functionally and syntactically correct, although I would argue Example 8 is more Pythonic.
Instead of getting just the State with the largest population in 2010, let's print States and their populations by descending population:
>>> # Example 9: Print State and population by descending population
>>> # using Sort geoprocessing tool
>>> sorted_table = arcpy.Sort_management(layer,
... "in_memory/sorted_pop",
... "POP2010 DESCENDING")
...
>>> cursor_sorted_table = arcpy.da.SearchCursor(sorted_table, fields)
>>> with cursor_sorted_table:
... for state, area, sub_region, pop2010 in cursor_sorted_table:
... print "{}, {}".format(state, pop2010)
...
CA, 37253956
TX, 25145561
NY, 19378102
...
DC, 601723
WY, 563626
PR, -99
>>> del cursor_sorted_table
>>>
>>> # Example 10: Print State and population by descending population
>>> # appending SQL ORDER BY clause
>>> sql = "ORDER BY POP2010 DESC"
>>> cursor_orderby_sql = arcpy.da.SearchCursor(layer, fields, sql_clause=(None, sql))
>>> with cursor_orderby_sql:
... for state, area, sub_region, pop2010 in cursor_orderby_sql:
... print "{}, {}".format(state, pop2010)
...
CA, 37253956
TX, 25145561
NY, 19378102
...
DC, 601723
WY, 563626
PR, -99
>>> del cursor_orderby_sql
>>>
>>> # Example 11: Print State and population by descending population
>>> # using built-in sorted function
>>> with cursor:
... for state, area, sub_region, pop2010 in sorted(cursor,
... key=itemgetter(3),
... reverse=True):
... print "{}, {}".format(state, pop2010)
...
CA, 37253956
TX, 25145561
NY, 19378102
....
DC, 601723
WY, 563626
PR, -99
>>>
Looking over the descending population examples:
I included Example 9 because it uses ArcGIS geoprocessing tools and can be implemented easily in the GUI. Similar to Example 5 above, Example 9 is cumbersome if one is scripting instead of using the GUI. Example 10 uses some basic SQL for a straightforward solution, although it does involve having to create another cursor instead of recycling the existing cursor. Example 11 is idiomatic in that it uses the built-in sorted function and treats the cursor as an iterable. Since sorted does return a newly sorted Python list from an iterable, using the function could become an issue with extremely large data sets.
There are numerous other examples I thought up, but this post is already longer than I expected. I believe the 3 sets of examples above demonstrate how Python built-in functions that operate on iterables can be used with ArcPy Data Access cursors to write idiomatic Python for ArcGIS. The next part of the series looks at using generators and generator expressions with ArcPy Data Access cursors.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.