Skip navigation
All People > bixb0012 > Tilting at Globes > 2015 > November

This is the first in a multi-part series on ArcPy cursors; particularly, working with ArcPy cursors as iterable objects in Python.  The first part in the series looks at some of the important components of iteration in Python.  The second part in the series looks at iterating and looping over ArcPy Data Access cursors.  The third part in the series looks at using several Python built-in and itertool functions with ArcPy Data Access cursors.  The fourth part in the series will look at using generators or generator expressions to separate selection or filtering logic for code re-use.  Fifth or following parts are unknown at this point.


This and the following series of blog posts focuses on ArcPy Data Access cursors in the context of iteration, hence the title of the series.  Now, it might seem silly to talk about iterable cursors since a cursor is basically worthless without iteration, but my experience scripting with ArcPy cursors and with responding to questions on GeoNet has motivated me to share some Pythonic ways of thinking about and working with cursors.


When talking about iteration in Python, there are numerous terms and expressions that can be relevant to a discussion.  Five terms that I believe to be especially important, and relevant to this series of blog posts, are:



A function which returns an iterator. It looks like a normal function except that it contains yield statements for producing a series of values usable in a for-loop or that can be retrieved one at a time with the next() function. Each yield temporarily suspends processing, remembering the location execution state (including local variables and pending try-statements). When the generator resumes, it picks-up where it left-off (in contrast to functions which start fresh on every invocation).


generator expression

An expression that returns an iterator. It looks like a normal expression followed by a for expression defining a loop variable, range, and an optional if expression. The combined expression generates values for an enclosing function:

>>> sum(i*i for i in range(10))         # sum of squares 0, 1, 4, ... 81



An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict and file and objects of any classes you define with an __iter__() or __getitem__() method. Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), ...). When an iterable object is passed as an argument to the built-in function iter(), it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call iter() or deal with iterator objects yourself. The for statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop. See also iterator, sequence, and generator.



An object representing a stream of data. Repeated calls to the iterator’s next() method return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its next() method just raise StopIteration again. Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.



An iterable which supports efficient element access using integer indices via the __getitem__() special method and defines a len() method that returns the length of the sequence. Some built-in sequence types are list, str, tuple, and unicode. Note that dict also supports __getitem__() and __len__(), but is considered a mapping rather than a sequence because the lookups use arbitrary immutable keys rather than integers.

Compared to the old glossary from ArcGIS Resources, now renamed GIS Dictionary and housed over at Esri Support, the Python Glossary is quite substantial.  That said, the Python Glossary is also one of those that makes sense to people that already know the answer, but it can be a bit of a reach for people new to the language.


From looking over the glossary excerpt above, one can tease out that a few special/magic/dunder methods are very important to iteration:  __iter__(), __getitem__(), and next() (or __next__() if one is working in Python 3.x).  There are also a couple of important built-in functions that interact with those methods:  iter() and next().  And last but not least, the for loop and the yield statement.  There are more methods, functions, statements, and control structures involved with iteration, but the aforementioned ones are central to any discussion.  For those readers interested in learning more about iterators, generators, sequences, etc...; there are numerous primers and tutorials that have already been written and are just a quick Google search away.


Lists are so common in Python, and newcomers to the language get exposure to them so early, that I will use Python list examples for context alongside the ArcPy Data Access search cursor examples.  The Python Glossary states that a list is "a built-in Python sequence," and a sequence is "an iterable which supports efficient element access using integer indices...."  The ArcPy Data Access SearchCursor documentation states the search cursor "returns an iterator of tuples."  Since iterators are also iterable, one gets a sense that built-in functions and expressions commonly used for manipulating lists may also be used with ArcPy cursors.


Although one can use the built-in dir() function to attempt to return a list of valid attributes for an object, I am going to forgo inspecting the objects that way because it will introduce clutter from all of the attributes that aren't related to iteration.  Instead, I will rely on the built-in isinstance() function along with built-in abstract base classes (ABCs) to look at iteration traits.


>>> #import relevant ABCs from collections module
>>> from collections import Iterable, Iterator, Sequence
>>> #create a sample list and arcpy.da.SearchCursor
>>> l = [10, 20, 30, 40, 50]
>>> cur = arcpy.da.SearchCursor(fc,["OID@", "SHAPE@"])
>>> #look at iteration traits of sample objects
>>> abcs = (Iterable, Iterator, Sequence)
>>> [isinstance(l, abc) for abc in abcs]
[True, False, True]  #Iterable, not Iterator, Sequence
>>> [isinstance(cur, abc) for abc in abcs]
[True, True, False]  #Iterable, Iterator, not Sequence
>>> #look at the types for sample objects and iterators of sample objects
>>> it_l = iter(l)
>>> type(l)
<type 'list'>
>>> type(it_l)
<type 'listiterator'>
>>> it_cur = iter(cur)
>>> type(cur)
<type 'da.SearchCursor'>
>>> type(it_cur)
<type 'da.SearchCursor'>
>>> #look at identify of cursor objects
>>> id(cur)
>>> id(it_cur)


As one can see above, which basically demonstrates what is stated in the Python Glossary, a list is both an iterable and sequence but not an iterator whereas an ArcPy Data Access search cursor is both an iterable and iterator but not a sequence.  From lines 20-21, we see that calling iter() on a list returns a new type of object as well as a new object, i.e., the listiterator.  From lines 26-27, we see that calling iter() on a search cursor returns a search cursor instead of a new iterator object.  Not only is a search cursor returned by calling iter(), but lines 30-33 show that the same search cursor object is returned when doing so.  This design pattern of having an iterable be its own iterator is fairly common in Python.


In the next post in this series, we will move beyond the components of iteration and start actually iterating over Python objects, including ArcPy Data Access cursors.

I was on the fence whether to write a blog post or simply open a discussion in the Python place.  It has been a while since I contributed to my blog, so I am going with the former.


The ArcPy Data Access Walk function (arcpy.da.Walk) is a real workhorse function, and I rely on it quite a bit.  The ArcPy Walk function is an example where Esri got it right with their Python implementation.  I can't say what their design goal was, but it seems rather apparent the aim was to create a geospatially-aware version of Python's os.walk function.  The names are the same, many of the parameters are the same, the results are similar, etc....  I think emulating this long-standing, native Python functionality in ArcPy was a win because it didn't reinvent a perfectly good wheel, i.e., users familiar with os.walk can easily transition to using arcpy.da.Walk.


As much as I like the ArcPy Walk function, there are a couple of minor issues I have with it.  The first is a documentation issue, or should I say lack of documentation.  One of the Walk function's parameters is datatype.  The documentation has a table that lists all of the acceptable arguments for datatype, but there is no actual documentation of the data types themselves.  Looking over the data type names, it seems most of them are obvious, so maybe Esri decided they didn't need to document them.


As descriptive as a name might seem at first glance, lack of documentation usually leads to ambiguity and confusion.  For example, what is covered under "FeatureClass"?  Obviously one would assume a feature class in a geodatabase is covered but what about shape files?  Are shape files feature classes?  In Esri-land, the answer is usually "Yes" but not always.  The real riddle is the "Geo" datatype since it includes shape files but not feature classes in a geodatabase.  Feature classes aren't "geo?"  As important as documentation is for libraries/APIs, it isn't the main reason for writing today.


One of my favorite ArcPy Walk patterns is to call Python's built-in next() function once to return a list of shape files in a folder or feature classes in a geodatabase or feature classes in a feature dataset.

>>> workspace = #path to folder or geodatabase or feature dataset
>>> _, _, filenames = next(arcpy.da.Walk(workspace, datatype="FeatureClass"))
>>> filenames
[u'canadwshed_p.shp', u'plots.shp']
>>> #or looping over feature classes or shape files
>>> for file in next(arcpy.da.Walk(workspace, datatype="FeatureClass"))[2]:
...     print file


In the first example, I create a throw-away Workspace Walker object that I don't have to bother keeping around or deconstructing.  In the second example, calling next() in the for loop allows me to cut out an extra loop when I am only interested in one level of geospatial data.


As is the case with most things in life, there isn't just one way to list feature classes in a geodatabase or shape files in a folder.  In fact, the ArcPy Walk function is the new kid on the block being introduced in ArcGIS 10.1 SP1.  Prior to ArcPy Walk, a user could use one of the many ArcPy listing functions (ListDatasets, ListFeatureClasses, ListFiles, ListRasters, ListTables, and ListWorkspaces) or the ArcPy Describe function.  I tend to prefer ArcPy Walk over the others because of its ease of use and similarity to built-in Python functionality.


The main reason for this blog post is to share one area where the ArcPy Walk function stumbles, i.e., in-memory data sources.  Whereas the older methods of listing data sources work with in-memory workspaces, that is not the case with ArcPy Walk.

>>> arcpy.CreateFeatureclass_management('in_memory', 'test_fc')
<Result 'in_memory\\test_fc'>
>>> arcpy.CreateTable_management('in_memory', 'test_tbl')
<Result 'in_memory\\test_tbl'>
>>> arcpy.CreateRasterDataset_management('in_memory','test_rd')
<Result 'in_memory\\test_rd'>
>>> #using ArcPy Walk
>>> next(arcpy.da.Walk('in_memory'))[2]
>>> #using ArcPy listing functions
>>> arcpy.env.workspace = 'in_memory'
>>> arcpy.ListFeatureClasses()
>>> arcpy.ListTables()
>>> arcpy.ListRasters()
>>> arcpy.ListDatasets()
>>> #using ArcPy Describe function
>>> [ for child in arcpy.Describe('in_memory').children]
[u'test_tbl', u'test_fc', u'test_rd']


Not supporting in-memory workspaces isn't much more than a stumble, but it is a stumble nonetheless.  After all, the documentation does say the first parameter is the "top-level workspace" and yet no mention is made that in-memory workspaces aren't supported.  Fortunately for users, there are at least two other ways to list in-memory data sources.


UPDATE 06/2017:


Since writing this blog post nearly 18 months ago (I know "time flies" but still, 18 months already?), I have come to discover the issue with ArcPy Walk and in-memory workspaces is more nuanced than I originally thought.  Let me demonstrate:

>>> arcpy.CreateFeatureclass_management('in_memory', 'test_fc')
<Result 'in_memory\\test_fc'>
>>> arcpy.CreateTable_management('in_memory', 'test_tbl')
<Result 'in_memory\\test_tbl'>
>>> arcpy.CreateRasterDataset_management('in_memory','test_rd')
<Result 'in_memory\\test_rd'>
>>> # using ArcPy Walk with "GPInMemoryWorkspace" rather than common "in_memory"
>>> next(arcpy.da.Walk('GPInMemoryWorkspace'))[2]
[u'test_tbl', u'test_fc', u'test_rd']


So, it turns out that ArcPy Walk works just fine with in-memory workspaces, when it actually knows you are pointing it to an in-memory workspace.  The really frustrating part of this, and even a bit lamentable, is that this is simply about semantics and Esri still can't manage to fix it.  Functionally, ArcPy Walk already works with in-memory workspaces, the function just doesn't know that everyone else and every other tool refers to those spaces as "in_memory" instead of "GPInMemoryWorkspace".