The Iterable Cursor: Iterating & Looping

5005
0
01-18-2016 12:19 PM
Labels (1)
JoshuaBixby
MVP Esteemed Contributor
4 0 5,005

This is the second in a multi-part series on ArcPy cursors; particularly, working with ArcPy cursors as iterable objects in Python.  The first part in the series looks at some of the important components of iteration in Python.  The second part in the series looks at iterating and looping over ArcPy Data Access cursors.  The third part in the series looks at using several Python built-in and itertool functions with ArcPy Data Access cursors.  The fourth part in the series will look at using generators or generator expressions to separate selection or filtering logic for code re-use.  Fifth or following parts are unknown at this point.

The first part in this series is likely a bit academic for some ArcPy scripters, but I believe a little theory goes a long ways to understanding the practice of something, in this case working with ArcPy Data Access cursors in an idiomatic way.  Also, the terms and concepts laid out in the first post will come up time and again throughout the series.

Before moving on, I will put a plug in for a presentation from a few years back: Loop Like A NativeNed Batchelder gives a nice overview of looping in Python, especially for those with looping experience in other programming languages.  It took me several times of watching it for the whole presentation to sink in, but it really did change the way I view iterating and looping in Python.

Recycling the Python list and ArcPy cursor examples from the previous post, let's call iter() to return an iterator for manually stepping through each iterable.

>>> #create list, attach 2 iterators, and retrieve values by
>>> #    calling next() and object.next()
>>> l = [10, 20, 30, 40, 50]
>>> it_l = iter(l)
>>> it2_l = iter(l)
>>> print next(it_l), next(it2_l)
10 10
>>> print it_l.next(), it2_l.next()
20 20
>>> 
>>> #create search cursor, attach 2 iterators, and retrieve values by
>>> #    calling next and object.next()
>>> cur = arcpy.da.SearchCursor(fc, ["OID@", "SHAPE@"])
>>> it_cur = iter(cur)
>>> it2_cur = iter(cur)
>>> print next(it_cur), next(it2_cur)
(1, <Polyline object at 0x1100fcf0[0x1100ff20]>) (2, <Polyline object at 0x1100fcf0[0x1100ff20]>)
>>> print it_cur.next(), it2_cur.next()
(3, <Polyline object at 0x1100fcf0[0x1100ff20]>) (4, <Polyline object at 0x1100fcf0[0x1100ff20]>)
>>> print next(cur)
(5, <Polyline object at 0x1100fcf0[0x1100ff20]>)
>>>

There is a fair amount to comment on with the code above:

  • A single iterable can have multiple iterators simultaneously accessing it.  How the iterable behaves with multiple iterators is implementation specific.
    • As the iterator definition states in the Python Glossary, a list "produces a fresh new iterator each time you pass it to a iter() function or use it in a for loop."  This explains why lines 06-07 and 08-09 are printing out the same values for both iterators.
    • In contrast to the Python list, the ArcPy Data Access search cursor does not produce a new iterator, i.e., each subsequent call to an iter() function returns the same iterator object that is already in use.  In these types of situations, each call to next() moves the iterator ahead one element regardless of which iterator makes the call.  This explains why lines 16-17 show the first and second OID instead of showing the first OID twice.
  • An iterator can be moved ahead by using either the built-in next() function or the object's next method.  Starting at Python 3.0, with the adoption of PEP 3114, the preferred method to manually iterate is the built-in next() function.
  • Since ArcPy Data Access cursors are their own iterator, one doesn't need to call iter() to get an iterator object before calling next().  Line 20 shows the cursor object itself can be passed to next() to retrieve the next item and move the cursor ahead.

Fortunately for us, the Python for statement does a lot of lifting to streamline the steps so we don't have to manually retrieve an iterator and call next() until the end of the iterable is reached.

Revisiting the SearchCursor documentation:

Summary

SearchCursor establishes read-only access to the records returned from a feature class or table.

Returns an iterator of tuples. The order of values in the tuple matches the order of fields specified by the field_names argument.

Discussion

Geometry properties can be accessed by specifying the token SHAPE@ in the list of fields.

Search cursors can be iterated using a For loop.  [Removed at 10.3.1Search cursors also support With statements; using a With statement will guarantee close and release of database locks and reset iteration].

As one can see, the documentation clearly states the arcpy.da.SearchCursor returns an iterator, an iterator of tuples to be specific.  It makes sense to have it return tuples versus lists since we are using a search cursor that can't update data and tuples are immutable by design.  The second statement in the Discussion section is redundant since for in Python iterates over any iterable, SearchCursor or otherwise, but that statement wouldn't stand out as being redundant if Esri didn't remove the rest of the paragraph that used to follow.

This is a not-so quick aside on an Esri #fail, an example of how not to handle customer feedback.

As shown above, prior to ArcGIS 10.3.1, Esri included a couple statements regarding Python with statements.  The fact that ArcPy Data Access cursors support with statements is worth pointing out, even documenting one might say.  The issue with the two statements was really just an issue with the latter statement, I guarantee it!  Guarantee is a strong word, a definitive word, and the problem is not all database locks are closed and released.

A bug was submitted for the documentation to be updated, BUG-000083762: In each cursor documentation, specify the type of lock being closed and released, as a shared lock is still present in the geodatabase after the 'with' statement executes.  The issue was identified as "fixed" in ArcGIS 10.3.1.  If you want to go find that clarification on locks, I already showed it to you.  Yep, there isn't any, they simply removed the statement about locks.  The insult to injury, they also removed a very important statement about Data Access cursors supporting the Python with statement.

Although the documentation speaks to iterators and iterating, I feel a real opportunity was lost with the code samples to demonstrate a handy Python feature.  A lot of ArcGIS users that are new to Python learn the language by emulating code examples.  In terms of showing Pythonic examples, the ArcPy Data Access cursors are a mixed bag.

Code Sample

SearchCursor example 1

Use SearchCursor to step through a feature class and print specific field values and the x,y coordinates of the point.

import arcpy

fc = 'c:/data/base.gdb/well'
fields = ['WELL_ID', 'WELL_TYPE', 'SHAPE@XY']

# For each row print the WELL_ID and WELL_TYPE fields, and the
# the feature's x,y coordinates
with arcpy.da.SearchCursor(fc, fields) as cursor:
    for row in cursor:
        print('{0}, {1}, {2}'.format(row[0], row[1], row[2]))

As pointed out in my earlier soapbox/aside, the ArcPy Data Access documentation fails to mention that cursors support the Python with statement.  That said, support is implied by the use of Python with statements in the examples.  It is worth one's time to read up on Python with statements, and I encourage their use with ArcPy Data Access cursors whenever possible.

Whereas the documentation examples demonstrate using Python with statements, even though the documentation itself doesn't state they are supported, the examples do fail to demonstrate the use of iterable or sequence unpacking.  Iterable or sequence unpacking is a great feature of Python, and it can be used to make code much more compact and readable at the same time.  Sequence unpacking is briefly mentioned in the Python documentation for Tuples and Sequences, and PEP 3132 -- Extended Iterable Unpacking discusses changes introduced in Python 3.0.

Let's take a look at how iterable unpacking can be used with SearchCursor example 1 from above.

import arcpy

fc = 'c:/data/base.gdb/well'
fields = ['WELL_ID', 'WELL_TYPE', 'SHAPE@XY'
]

# For each row print the WELL_ID and WELL_TYPE fields, and the
# the feature's x,y coordinates

# Original example using sequence indexing
with arcpy.da.SearchCursor(fc, fields) as cursor:
    for row in cursor:
        print('{0}, {1}, {2}'.format(row[0], row[1], row[2]))

# Example using manual sequence unpacking
with arcpy.da.SearchCursor(fc, fields) as cursor:
    for row in cursor:
        well_id = row[0]
        well_type = row[1]
        well_xy = row[2]
        print('{0}, {1}, {2}'.format(well_id, well_type, well_xy))

# Example using built-in sequence unpacking
with arcpy.da.SearchCursor(fc, fields) as cursor:
    for well_id, well_type, well_xy in cursor:
        print('{0}, {1}, {2}'.format(well_id, well_type, well_xy))

I provided an example of manual sequence unpacking because it is fairly common to see that pattern with people coming over to Python from other languages.  Although manual unpacking is syntactically and functionally correct, it can usually be replaced by using built-in unpacking, thus saving some lines of code and being more idiomatic.  With this specific series of examples, it turns out that using built-in sequence unpacking isn't any more compact than using sequence indexing; however, I find reading and maintaining code that uses sequence unpacking is much more straightforward than having to remember which index means what in a sequence.

The Python for statement, with statement, and iterable/sequence unpacking; all essentials when working with the iterable cursor.

About the Author
I am currently a Geospatial Systems Engineer within the Geospatial Branch of the Forest Service's Chief Information Office (CIO). The Geospatial Branch of the CIO is responsible for managing the geospatial platform (ArcGIS Desktop, ArcGIS Enterprise, ArcGIS Online) for thousands of users across the Forest Service. My position is hosted on the Superior National Forest. The Superior NF comprises 3 million acres in northeastern MN and includes the million-acre Boundary Waters Canoe Area Wilderness (BWCAW).