This is the first in a multi-part series on ArcPy cursors; particularly, working with ArcPy cursors as iterable objects in Python. The first part in the series looks at some of the important components of iteration in Python. The second part in the series looks at iterating and looping over ArcPy Data Access cursors. The third part in the series looks at using several Python built-in and itertool functions with ArcPy Data Access cursors. The fourth part in the series will look at using generators or generator expressions to separate selection or filtering logic for code re-use. Fifth or following parts are unknown at this point.
This and the following series of blog posts focuses on ArcPy Data Access cursors in the context of iteration, hence the title of the series. Now, it might seem silly to talk about iterable cursors since a cursor is basically worthless without iteration, but my experience scripting with ArcPy cursors and with responding to questions on GeoNet has motivated me to share some Pythonic ways of thinking about and working with cursors.
When talking about iteration in Python, there are numerous terms and expressions that can be relevant to a discussion. Five terms that I believe to be especially important, and relevant to this series of blog posts, are:
A function which returns an iterator. It looks like a normal function except that it contains yield statements for producing a series of values usable in a for-loop or that can be retrieved one at a time with the next() function. Each yield temporarily suspends processing, remembering the location execution state (including local variables and pending try-statements). When the generator resumes, it picks-up where it left-off (in contrast to functions which start fresh on every invocation).
An expression that returns an iterator. It looks like a normal expression followed by a for expression defining a loop variable, range, and an optional if expression. The combined expression generates values for an enclosing function:
>>> sum(i*i for i in range(10)) # sum of squares 0, 1, 4, ... 81
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict and file and objects of any classes you define with an __iter__() or __getitem__() method. Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), ...). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop. See also iterator, sequence, and generator.
An object representing a stream of data. Repeated calls to the iterator’s next() method return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its next() method just raise StopIteration again. Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.
An iterable which supports efficient element access using integer indices via the __getitem__() special method and defines a len() method that returns the length of the sequence. Some built-in sequence types are list, str, tuple, and unicode. Note that dict also supports __getitem__() and __len__(), but is considered a mapping rather than a sequence because the lookups use arbitrary immutable keys rather than integers.
Compared to the old glossary from ArcGIS Resources, now renamed GIS Dictionary and housed over at Esri Support, the Python Glossary is quite substantial. That said, the Python Glossary is also one of those that makes sense to people that already know the answer, but it can be a bit of a reach for people new to the language.
From looking over the glossary excerpt above, one can tease out that a few special/magic/dunder methods are very important to iteration: __iter__(), __getitem__(), and next() (or __next__() if one is working in Python 3.x). There are also a couple of important built-in functions that interact with those methods: iter() and next(). And last but not least, the for loop and the yield statement. There are more methods, functions, statements, and control structures involved with iteration, but the aforementioned ones are central to any discussion. For those readers interested in learning more about iterators, generators, sequences, etc...; there are numerous primers and tutorials that have already been written and are just a quick Google search away.
Lists are so common in Python, and newcomers to the language get exposure to them so early, that I will use Python list examples for context alongside the ArcPy Data Access search cursor examples. The Python Glossary states that a list is "a built-in Python sequence," and a sequence is "an iterable which supports efficient element access using integer indices...." The ArcPy Data Access SearchCursor documentation states the search cursor "returns an iterator of tuples." Since iterators are also iterable, one gets a sense that built-in functions and expressions commonly used for manipulating lists may also be used with ArcPy cursors.
Although one can use the built-in dir() function to attempt to return a list of valid attributes for an object, I am going to forgo inspecting the objects that way because it will introduce clutter from all of the attributes that aren't related to iteration. Instead, I will rely on the built-in isinstance() function along with built-in abstract base classes (ABCs) to look at iteration traits.
>>> #import relevant ABCs from collections module
>>> from collections import Iterable, Iterator, Sequence
>>> #create a sample list and arcpy.da.SearchCursor
>>> l = [10, 20, 30, 40, 50]
>>> cur = arcpy.da.SearchCursor(fc,["OID@", "SHAPE@"])
>>> #look at iteration traits of sample objects
>>> abcs = (Iterable, Iterator, Sequence)
>>> [isinstance(l, abc) for abc in abcs]
[True, False, True] #Iterable, not Iterator, Sequence
>>> [isinstance(cur, abc) for abc in abcs]
[True, True, False] #Iterable, Iterator, not Sequence
>>> #look at the types for sample objects and iterators of sample objects
>>> it_l = iter(l)
>>> it_cur = iter(cur)
>>> #look at identify of cursor objects
As one can see above, which basically demonstrates what is stated in the Python Glossary, a list is both an iterable and sequence but not an iterator whereas an ArcPy Data Access search cursor is both an iterable and iterator but not a sequence. From lines 20-21, we see that calling iter() on a list returns a new type of object as well as a new object, i.e., the listiterator. From lines 26-27, we see that calling iter() on a search cursor returns a search cursor instead of a new iterator object. Not only is a search cursor returned by calling iter(), but lines 30-33 show that the same search cursor object is returned when doing so. This design pattern of having an iterable be its own iterator is fairly common in Python.
In the next post in this series, we will move beyond the components of iteration and start actually iterating over Python objects, including ArcPy Data Access cursors.