2016

February 2016

# The Single Multipart:  Polygons With Interior Boundaries

Posted by bixb0012 Feb 20, 2016

The ArcPy Geometry classes have been around since ArcPy was introduced in ArcGIS 10.0.  ArcGIS 10.1 brought some noticeable enhancements to ArcPy, including additional properties and methods to the Geometry classes.  There have been bug fixes and enhancements with newer releases, as well as a Call for More Pythonic ArcPy Geometries, but the classes haven't changed much over the past 4 years.  This is why I was a bit surprised when I ran into an issue with the ArcPy Polygon class recently.  The surprise wasn't that I ran into an issue, but more that I hadn't run into it or noticed it earlier than now.

The ArcPy Polygon isMultipart and partCount properties have been around since ArcGIS 10.0.  Interestingly enough, the documentation for them hasn't changed one word in that time.

The issue I ran into can be demonstrated by two square polygons, one of which has a hole in it:

``>>> # Create the 2 polygons>>> poly = arcpy.FromWKT('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))')>>> poly_hole = arcpy.FromWKT('POLYGON((12 0, 22 0, 22 10, 12 10, 12 0),'...                                   '(15 3, 15 7, 19 7, 19 3, 15 3))')...                                   >>> # Check the number of parts>>> poly.partCount1>>> poly_hole.partCount1>>> >>> # Check the multipart property>>> poly.isMultipartFalse>>> poly_hole.isMultipartTrue>>>``

Wait a second, something doesn't add up here.  Both polygons have 1 part, but the polygon with a hole in it is a multipart?  It seems we are witnessing the multi-part single-part polygon, a sideshow favorite that you don't even have to visit the circus to see.

The facile explanation is that the polygon with a hole in it is a multipart because it has an exterior and interior boundary, as opposed to the other polygon that only has an exterior boundary.  Multiple parts, see, simple.  The explanation does make sense, but it also implies that partCount has been broken since ArcGIS 10.0.  Well, maybe partCount has been working all along and the documentation has been wrong for 6 years.  Then again, maybe isMultipart has been broken for 6 years.  Who knows, not me, but I just can't believe I haven't been bitten by this bug long before now.

In terms of what next, assuming Esri Development acknowledges this is either a software or documentation bug, my money is on the documentation getting updated, eventually.  It is always easier to add additional footnotes and asterisks to documentation than it is to update code.  That said, I think there is actually a larger problem with the ArcPy Geometry classes.

I spent some time trying the example above in different spatial systems, even other Esri software components outside of ArcPy, and I have come to the conclusion the real problem is in the labels themselves.  Specifically, the problem lies in the use of "part" to talk about geometries.  Take a look at the List of SQL functions for Esri's ST_Geometry data type, the OGC Methods on Geometry Instances for Microsoft's geometry data type, or the PostGIS Reference; you will not find a single function, method, or property with the word "part" in it.  There are accessor functions for counting geometries, but they count "geometries" and not "parts."  There are accessor functions for counting interior rings, but they count "interior rings" and not "parts."  You see where this is going.

The problem with the ArcPy Geometry classes, at least one of them, is that Esri replaced multiple, specific geometry components with a single generic one, the "part."  By abstracting geometries and rings with parts, not only did they deviate from geospatial standards and norms, they created the sideshow-worthy multi-part single-part polygon.

# The Iterable Cursor:  Python Built-ins & Itertools

Posted by bixb0012 Feb 2, 2016

This is the third in a multi-part series on ArcPy cursors; particularly, working with ArcPy cursors as iterable objects in Python.  The first part in the series looks at some of the important components of iteration in Python.  The second part in the series looks at iterating and looping over ArcPy Data Access cursors.  The third part in the series looks at using several Python built-in and itertool functions with ArcPy Data Access cursors.  The fourth part in the series will look at using generators or generator expressions to separate selection or filtering logic for code re-use.  Fifth or following parts are unknown at this point.

The first two parts in this series cover iteration components of Python and iterating/looping in Python, both of which are crucial to understanding and working with iterables.  Beyond manually iterating or looping over an iterable, there are numerous Python built-in functions that work with iterables and sequences.  Starting back in version 2.3, an itertools module was introduced that contains "functions creating iterators for efficient looping."

The focus of this series is treating ArcPy Data Access cursors as Python iterables.  Like most things in life, there is more than one way to answer a question using ArcGIS, and I want to provide some contrasting examples along with Pythonic examples.  As much as I like writing idiomatic Python, there will be plenty of times when using native geoprocessing tools will outperform straight Python.  Writing fast code is great, but it's a separate discussion for a different day.

There are so many built-in and itertool functions that work with iterables, I can't possibly demonstrate them all, but I will demonstrate a handful to illustrate how such functions work nicely with ArcPy Data Access cursors.  For this and future parts in the series, I will leave the previous examples behind in favor of a real-world dataset that readers can download and experiment with themselves.  Specifically, I will use the USA States layer package included with Esri Data & Maps and available on ArcGIS.com.

The same ArcPy Data Access SearchCursor will be used for most of the examples below:

``>>> layer = r'USA States\USA States (below 1:3m)'>>> fields = ["STATE_ABBR", "SHAPE@AREA", "SUB_REGION", "POP2010"]>>> cursor = arcpy.da.SearchCursor(layer, fields)>>>``

One question that comes up from time to time in the forums/GeoNet is how to count the number of records in a data set or selection set using cursors:

``>>> # Example 1: Get record count using ArcGIS geoprocessing tool>>> arcpy.GetCount_management(layer)<Result '52'>>>>>>> # Example 2: Get record count using variable as counter>>> with cursor:...     i = 0...     for row in cursor:...         i = i + 1   # or i += 1...     print i...52>>>>>> # Example 3: Get record count using built-in list and len functions>>> with cursor:...     print len(list(cursor))...52>>>>>> # Example 4: Get record count using built-in sum function>>> with cursor:...     print sum(1 for row in cursor)...52>>>``

Looking over the record counting examples:

• Example 1 uses ArcGIS's Get Count geoprocessing tool.
• The Get Count tool retrieves the number of records in a feature class, table, layer, or raster; it does not operate against cursors.  It can also be used in the GUI.
• Example 2 loops over the cursor using a variable as a counter.
• A counting loop structure is common across a wide range of programming languages.
• Example 3 uses built-in list and len functions.
• The list function converts the entire cursor into a Python list, and the len function returns the length of the list, which corresponds to the number of records in the cursor.
• Creating a Python list does require copying the entire contents of the cursor into memory, which could become an issue for extremely large data sets.
• Example 4 uses the built-in sum function along with a generator expression.
• Using a Python generator expression instead of a list does not copy the entire contents of the cursor into memory.

I included Example 1 because it is the highest performing approach to getting record counts of data sets and selection sets within the ArcGIS framework, but it doesn't deal with a cursor as a Python iterable, which is the focus of this series.  Example 2 is functionally and syntactically correct, although I would argue it isn't the most Pythonic.  Examples 3 and 4 both use Python built-in functions that treat a cursor as an iterable, but it is arguable whether Example 3 or 4 is more Pythonic because each approach has strengths and weaknesses.

Another question that comes up occasionally is how to retrieve a record from a data set based on the minimum or maximum values of one of the fields in the data set.  This next set of examples will retrieve the record/row for the state with the highest population in 2010 (POP2010):

``>>> # Retrieve table name for data source of layer>>> desc = arcpy.Describe(layer)>>> fc_name = desc.featureClass.baseName>>>>>> # Example 5: Get maximum population record using ArcGIS geoprocessing tools>>> summary_table = arcpy.Statistics_analysis(layer,...                                           "in_memory/summary_max",...                                           "POP2010 MAX")...                                      >>> arcpy.AddJoin_management(layer,...                          "POP2010",...                          summary_table,...                          "MAX_POP2010",...                          "KEEP_COMMON")...                     >>> joined_fields = [(field if "@" in field else ".".join([fc_name, field]))...                  for field...                  in fields]>>> cursor_sql_join = arcpy.da.SearchCursor(layer, joined_fields)>>> print next(cursor_sql_join)(u'CA', 41.639274447708424, u'Pacific', 37253956)>>> del cursor_sql_join>>> arcpy.RemoveJoin_management(layer, "summary_max")<Result 'USA States\\USA States (below 1:3m)'>>>>>>> # Example 6: Get maximum population record using SQL subquery>>> sql = "POP2010 IN ((SELECT MAX(POP2010) FROM {}))".format(fc_name)>>> cursor_sql_subqry = arcpy.da.SearchCursor(layer, fields, sql)                                >>> print next(cursor_sql_subqry)(u'CA', 41.639274447708424, u'Pacific', 37253956)>>> del cursor_sql_subqry>>>>>> # Example 7: Get maximum population record by looping and comparing>>> with cursor:...     max_row = next(cursor)...     for row in cursor:...         if row[3] > max_row[3]:...             max_row = row...        >>> print max_row(u'CA', 41.639274447708424, u'Pacific', 37253956)>>>>>> # Example 8: Get maximum population record using built-in max function>>> from operator import itemgetter>>> with cursor:...     print max(cursor, key=itemgetter(3))...(u'CA', 41.639274447708424, u'Pacific', 37253956)>>>``

Looking over the maximum record examples:

• Examples 5 and 6 require the table name for the data source of the layer.
• The table name was found using the ArcPy Describe function and properties of two different Describe objects (the property lookups were chained together on line 03)
• Examples 5 and 6 both need the underlying table name but for different reasons.
• With Example 5, the base table name is needed to re-create the cursor after the summary statistics table is joined to the layer's source table.
• With Example 6, the base table name is needed to create the subquery used in the cursor's SQL WHERE clause.
• Example 5 uses ArcGIS's Summary Statistics and Add Join geoprocessing tools.
• The Summary Statistics tool finds the maximum population value in the dataset.  The Add Join tool allows the result of the Summary Statistics tool to be linked back to the original layer to find the corresponding record with the maximum population.
• Before creating a cursor against the newly joined layer, the original fields need to be updated to prepend the table name to the field name (line 16).
• Example 6 uses an SQL subquery in the WHERE clause of the SQL query.
• The SQL subquery identifies the maximum population value.  The value(s) returned from the SQL subquery are used to select record(s) in the original data set.
• Example 7 loops over the cursor using a variable to hold the record with the maximum value.
• A loop structure is common across a wide range of programming languages.
• Example 8 uses the built-in max function along with the operator.itemgetter function.
• The max function operates on an iterable, but it is necessary to use operator.itemgetter since the cursor represents an iterable of tuples and not an iterable of numeric data types.

I included Example 5 because it only uses ArcGIS geoprocessing tools and can be implemented in the GUI.  Although it can be implemented in the GUI with no scripting skills, Example 5 is also the most cumbersome, i.e., it has the most steps, is the slowest, and creates intermediate products.  With just a little bit of SQL or Python knowledge, the doors open to more eloquent and higher performing approaches.  Similar to Example 2 above, Example 7 is functionally and syntactically correct, although I would argue Example 8 is more Pythonic.

Instead of getting just the State with the largest population in 2010, let's print States and their populations by descending population:

``>>> # Example 9: Print State and population by descending population>>> #            using Sort geoprocessing tool>>> sorted_table = arcpy.Sort_management(layer,...                                      "in_memory/sorted_pop",...                                      "POP2010 DESCENDING")...                                      >>> cursor_sorted_table = arcpy.da.SearchCursor(sorted_table, fields)>>> with cursor_sorted_table:...     for state, area, sub_region, pop2010 in cursor_sorted_table:...         print "{}, {}".format(state, pop2010)...         CA, 37253956TX, 25145561NY, 19378102...DC, 601723WY, 563626PR, -99>>> del cursor_sorted_table>>> >>> # Example 10: Print State and population by descending population>>> #             appending SQL ORDER BY clause>>> sql = "ORDER BY POP2010 DESC">>> cursor_orderby_sql = arcpy.da.SearchCursor(layer, fields, sql_clause=(None, sql))>>> with cursor_orderby_sql:...     for state, area, sub_region, pop2010 in cursor_orderby_sql:...         print "{}, {}".format(state, pop2010)...         CA, 37253956TX, 25145561NY, 19378102...DC, 601723WY, 563626PR, -99>>> del cursor_orderby_sql>>> >>> # Example 11: Print State and population by descending population>>> #             using built-in sorted function>>> with cursor:...     for state, area, sub_region, pop2010 in sorted(cursor,...                                                    key=itemgetter(3),...                                                    reverse=True):...         print "{}, {}".format(state, pop2010)...         CA, 37253956TX, 25145561NY, 19378102....DC, 601723WY, 563626PR, -99>>>``

Looking over the descending population examples:

• Example 9 uses ArcGIS's Sort goeprocessing tool.
• The Sort tool sorts the original table into a new table.  A new cursor needs to be defined using the newly sorted feature class.
• Example 10 uses an SQL ORDER BY clause.
• The SQL ORDER BY clause is passed as part of the sql_clause while creating a new cursor.
• Example 11 uses the built-in sorted function along with the operator.itemgetter function.
• The sorted function operates on an iterable, but it is necessary to use operator.itemgetter since the cursor represents an iterable of tuples and not an iterable of numeric data types.
• The sorted function converts the entire cursor into a Python list, and creating a Python list does require copying the entire contents of the cursor into memory, which could become an issue for extremely large data sets.

I included Example 9 because it uses ArcGIS geoprocessing tools and can be implemented easily in the GUI.  Similar to Example 5 above, Example 9 is cumbersome if one is scripting instead of using the GUI.  Example 10 uses some basic SQL for a straightforward solution, although it does involve having to create another cursor instead of recycling the existing cursor.  Example 11 is idiomatic in that it uses the built-in sorted function and treats the cursor as an iterable.  Since sorted does return a newly sorted Python list from an iterable, using the function could become an issue with extremely large data sets.

There are numerous other examples I thought up, but this post is already longer than I expected.  I believe the 3 sets of examples above demonstrate how Python built-in functions that operate on iterables can be used with ArcPy Data Access cursors to write idiomatic Python for ArcGIS.  The next part of the series looks at using generators and generator expressions with ArcPy Data Access cursors.

By date: By tag: