Skip navigation
All People > bixb0012 > Tilting at Globes > 2015 > May
2015

I think this blog post follows up nicely to Dan Patterson's recent blog post, The specified item was not found.  The conversation about null and empty geometries in ArcGIS isn't new, but it does take some getting your head around.

 

When it comes to creating new feature classes, I am sure most people are quite familiar with the following screen from the New Feature Class wizard:

 

arccatalog_103_new_feature_class_shape_null.PNG

When creating a new feature class using ArcGIS Desktop, either through the GUI or ArcPy, the default settings allow for NULL values in the SHAPE field (see yellow highlight above).  Regardless of whether NULLs in the SHAPE field are a good or bad idea, they are supported and allowed by default, so it is good to understand what does/can happen when geometries aren't populated with the rest of a record in feature classes.

 

After troubleshooting several cases where NULL wasn't NULL, I decided to take a deeper look at what really happens when geometries aren't populated in feature classes.  It turns out, the answer depends both on the tool and type of geodatabase being used to insert, store, and retrieve records.  Presented here are the results for some of the more common tools and types of geodatabases.

 

Table 1 shows what actually gets populated in the SHAPE field when nothing, None, an empty geometry, and a non-empty geometry are inserted into a polygon feature class using three different methods.

 

TABLE 1:  SHAPE Field Values in Feature Class

Storage

Type

Insert

Method

NOTHINGNONEEMPTYPOLYGON

PGDB

ArcMap Editor

NullN/AN/APolygon
PGDBarcpy.InsertCursorNullErrorEmptyPolygon
PGDBarcpy.da.InsertCursorNullNullNullPolygon
FGDBArcMap EditorEmptyN/AN/APolygon
FGDBarcpy.InsertCursorNullErrorEmptyPolygon
FGDBarcpy.da.InsertCursorNullNullNullPolygon
SDE(SQL)ArcMap EditorEmptyN/AN/APolygon
SDE(SQL)arcpy.InsertCursorNullErrorEmptyPolygon
SDE(SQL)arcpy.da.InsertCursorNullNullNullPolygon
SDE(ORA)ArcMap EditorEmptyN/AN/APolygon
SDE(ORA)arcpy.InsertCursorNullErrorEmptyPolygon
SDE(ORA)arcpy.da.InsertCursorNullNullNullPolygon

NOTE:

  1. Storage Type:
    1. PGDB := personal geodatabase
    2. FGDB := file geodatabase
    3. SDE(SQL) := SQL Server enterprise geodatabase
    4. SDE(ORA) := Oracle enterprise geodatabase
  2. Insert Method:
    1. ArcMap Editor := edit session in ArcMap
    2. arcpy.InsertCursor := original/older ArcPy insert cursor
    3. arcpy.da.InsertCursor := ArcPy Data Access insert cursor
  3. Insert Geometry:
    1. NOTHING := no geometry is specified.
      1. In ArcMap edit session, table is populated with no geometry
      2. In ArcPy insert cursors, SHAPE field is not specified with cursor
    2. NONE := Python None object is passed to SHAPE field
    3. EMPTY := empty polygon is passed to SHAPE field
    4. POLYGON := non-empty polygon is created or passed to SHAPE field
  4. SHAPE Value:
    1. N/A := not applicable, i.e., not possible to insert or attempt to insert type of geometry or object
    2. Error := error is generated attempting to insert type of geometry or object
    3. Null := NULL value/marker in SHAPE field
    4. Empty := empty polygon or empty collection in SHAPE field
    5. Polygon := non-empty polygon in SHAPE field

 

Table 2 shows what gets returned by search cursors once the records from Table 1 have been inserted into a feature class.

 

Table 2:  Retrieved Geometry/Object From SHAPE Field

Storage

Type

Retrieve

Method

NULLEMPTYPOLYGON
PGDBarcpy.SearchCursorNoneEmptyPolygon
PGDBarcpy.da.SearchCursorNoneNonePolygon
FGDBarcpy.SearchCursorNoneEmptyPolygon
FGDBarcpy.da.SearchCursorNoneNonePolygon
SDE(SQL)arcpy.SearchCursorNoneEmptyPolygon
SDE(SQL)arcpy.da.SearchCursorNoneNonePolygon
SDE(ORA)arcpy.SearchCursorNoneEmptyPolygon
SDE(ORA)arcpy.da.SearchCursorNoneNonePolygon

NOTE:

  1. Storage Type:
    1. PGDB := personal geodatabase
    2. FGDB := file geodatabase
    3. SDE(SQL) := SQL Server enterprise geodatabase
    4. SDE(ORA) := Oracle enterprise geodatabase
  2. Retrieve Method:
    1. arcpy.SearchCursor := original/older ArcPy search cursor
    2. arcpy.da.SearchCursor := ArcPy Data Access search cursor
  3. SHAPE Value:
    1. NULL := NULL value/marker in SHAPE field
    2. EMPTY := empty polygon or empty collection in SHAPE field
    3. POLYGON := non-empty polygon in SHAPE field
  4. Retrieve Geometry:
    1. None := Python None object
    2. Empty := empty polygon or empty collection
    3. Polygon := non-empty polygon

 

It is clear there are some differences, inconsistencies one might say, with how different tools insert nothing or empty geometries into a feature class that allows NULL values.  Even with the same tool, there are situations where nothing or empty geometries are handled differently between different types of geodatabases.  There are also differences with how different search cursors, and likely update cursors, retrieve empty geometries from feature classes.

 

I can't say what it all means, but there are a few items that stuck out for me.

  • Non-empty geometries are handled consistently across tools and types of geodatabases.
  • "Allow NULL Values" for the SHAPE field doesn't mean missing geometries will be NULL, it means missing geometries may be NULL, they could also be empty.
  • The ArcPy Data Access search cursor returns None whether a SHAPE field is NULL or contains an empty geometry, so None doesn't necessarily mean NULL for geometries.

This is the second in a two-part series on the risks to software users of poor documentation; specifically, the confusion and unexpected results that come from weakly documented spatial operators in GIS software.  The first part in the series looks at how inconsistent and incomplete documentation requires users to guess how spatial operators are implemented.  The second part in the series looks at the inconsistent results that arise from mixing different implementations of spatial operators.

 

One of the many empty geometry bugs I have submitted recently got put on the front burner this week when I noticed Esri updated the status to "Closed:  Will Not be Addressed."  What has ensued is a discussion that is still unfolding over expected behavior versus unexpected results.  Coincidentally, the issue can be framed neatly within the discussion and examples from the first post in this series (What's Within:  When (Esri != Clementini) = ?).

 

The first post in this series discussed how there is no singular definition of spatial relations for geometry libraries and geospatial applications, and how the Dimensionally Extended 9 Intersection Model (DE-9IM) became the prevailing 2D definition after inclusion in the OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture.  The post focused on how incomplete documentation of spatial operators forces users to guess, and likely incorrectly at times, how software works instead of knowing how the software should work.  As unfortunate for users as incomplete documentation can be, mixing different definitions of spatial relations within the same spatial operator is much worse, and that appears to be what Esri has done.

 

Borrowing from the examples in the previous post, let's look at three features and the ArcPy Geometry.within() method.

 

>>> #Create square polygon
>>> polygon = arcpy.FromWKT('POLYGON((0 0, 3 0, 3 3, 0 3, 0 0))')
>>> #Create line that completely lies on polygon boundary
>>> line = arcpy.FromWKT('LINESTRING(1 0, 2 0)')
>>> #Create empty line
>>> line_empty = arcpy.FromWKT('LINESTRING EMPTY')
>>>
>>> #Test whether line is within polygon
>>> line.within(polygon)
False
>>> #Test whether line_empty is within polygon
>>> line_empty.within(polygon)
True

 

For comparative purposes, let's run the same two tests from Lines 9 and 12 using Esri’s ST_Geometry in Oracle:

 

SQL> --Test whether line is within polygon
SQL> SELECT sde.st_within(sde.st_geomfromtext('LINESTRING(1 0, 2 0)', 0),
                          sde.st_geomfromtext('POLYGON((0 0, 3 0, 3 3, 0 3, 0 0))', 0))
       FROM dual;
SDE.ST_WITHIN(SDE.ST_GEOMFROMTEXT('LINESTRING(10,20)',0),SDE.ST_GEOMFROMTEXT('PO'
--------------------------------------------------------------------------------
                                                                               0
SQL> --Test whether line_empty is within polygon
SQL> SELECT sde.st_within(sde.st_geomfromtext('LINESTRING EMPTY', 0),
                          sde.st_geomfromtext('POLYGON((0 0, 3 0, 3 3, 0 3, 0 0))', 0))
       FROM dual;
SDE.ST_WITHIN(SDE.ST_GEOMFROMTEXT('LINESTRINGEMPTY',0),SDE.ST_GEOMFROMTEXT('POLY'
--------------------------------------------------------------------------------
                                                                               0

 

Interesting, the results from using ST_Geometry differ from using ArcPy.  Knowing that ST_Geometry functions are OGC simple feature access and SQL compliant, which means they implement DE-9IM, the results from ST_Geometry in Oracle are expected because a line solely on the boundary of a polygon is not considered within the polygon, and an empty geometry cannot be within another geometry.

 

When looking at ArcPy results, Line 10 is correct only if the ArcPy Geometry Classes implement Clementini's definition since we saw in the first post of this series that Esri's definition of Within is True in this situation.  Unfortunately, the ArcPy documentation doesn’t state one way or another which definition it is implementing.  The result on Line 13 is correct only if the ArcPy Geometry Classes implement Esri's definition because the Clementini definition does not allow for an empty geometry to be within another geometry.  Clear as mud, right?

 

Esri's stance, to date, is that everything is working as designed.  What?!  If that is the case, Esri is implicitly acknowledging they are implementing different definitions of a spatial relation within the same spatial operator, it just depends what geometries you pass it!!  Between the closed source code, incomplete documentation, and seemingly arbitrary implementation; ArcPy Geometry Classes are use at your own risk.  Caveat utilitor.

This is the first in a two-part series on the risks to software users of poor documentation; specifically, the confusion and unexpected results that come from weakly documented spatial operators in GIS software.  The first part in the series looks at how inconsistent and incomplete documentation requires users to guess how spatial operators are implemented.  The second part in the series looks at the inconsistent results that arise from mixing different implementations of spatial operators.

 

Nine months since my last blog post, I can't say that was the pace I was aiming for when GeoNet got stood up (at least I don't give more weight to GeoNet resolutions than I do New Year's ones).  I don't know if my drought is busted, but a raw nerve has been hit hard enough by Esri to make it rain, at least for the moment.  Of all the soapboxes that litter my closets, poor documentation and its consequences is one of the most tattered.  As much as Honest Abe says you can't trust everything you read on the internet, I do think software users should be able to trust a company's online help/documentation.

 

One cannot spend too much time in the world of spatial relations without coming across the Dimensionally Extended 9 Intersection Model (DE-9IM).  The DE-9IM was developed by Clementini and others in the mid-'90s as an evolution of the 4 Intersection Model (4IM) and 9 Intersection Model (9IM).  Although the DE-9IM isn't the only definition of spatial relationships, it became the prevailing 2D definition after inclusion in the OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture.

 

References to and discussions of DE-9IM used to be found in various Esri documentation, but those references and documentation are becoming harder to find in current ArcGIS documentation.  For example, between the new 10.3 ArcGIS for Desktop, ArcGIS for Server, and ArcGIS for Developers sites, Clementini is only referenced on a couple handful of pages and DE-9IM is only referenced and discussed on one page:  Relational functions for ST_Geometry.  Whereas Python has a we're-all-grown-ups philosophy, Esri seems to be going the other direction and feeding us Pablum, without the vitamin fortification.

 

Although DE-9IM is the basis for 2D spatial predicates/relations in many geometry libraries and geospatial applications, there are two overlay types for the Select Layer By Location tool where Esri's default implementation differs from Clementini:  Contains, Within.  In both cases, the default or unqualified overlay type implies Esri's definition (blue underline) while the Clementini definition (red underline) is handled through qualifiers.

arcmap_103_select_layer_by_location_clementini.PNG

So how does the Esri definition of Contains and Within differ from the Clementini definition?  Forgoing mathematical notation and illustration matrices, the difference boils down to how boundaries of geometries are handled.  For example, geometry a that is entirely on the boundary of geometry b is considered within geometry b using Esri's definition but not within geometry b using Clementini's definition.  Let's create simple polygon and line features to demonstrate:

 

>>> polygon = arcpy.FromWKT('POLYGON((0 0, 3 0, 3 3, 0 3, 0 0))')
>>> line = arcpy.FromWKT('LINESTRING(1 0, 2 0)')
>>> arcpy.CopyFeatures_management(polygon, 'in_memory/polygon')
<Result 'in_memory\\polygon'>
>>> arcpy.CopyFeatures_management(line, 'in_memory/line')
<Result 'in_memory\\line'>

 

 

arcmap_simple_polygon_line_labeled.PNG

Executing the Select Layer By Location tool using both definitions of Within:

 

>>> arcpy.SelectLayerByLocation_management("line", "WITHIN", "polygon")
<Result 'line'>
>>> arcpy.GetCount_management("line")
<Result '1'>
>>> arcpy.SelectLayerByLocation_management("line", "WITHIN_CLEMENTINI", "polygon")
<Result 'line'>
>>> arcpy.GetCount_management("line")
<Result '0'>

 

So far, so good.  Beyond the fact that Esri's default definitions of Contains and Within differ from most other geometry libraries and geospatial applications, including the OGC simple feature standards, at least the results match the sparse documentation available online.

 

At this point, it is really important to point out something that is easily overlooked.  Esri's ST_Geometry functions are compliant with the OGC simple feature access and SQL standards, which means ST_Within adheres to the Clementini definition and not the Esri definition.

 

SQL> SELECT sde.st_within(sde.st_geomfromtext('LINESTRING(1 0, 2 0)', 0),
  2                       sde.st_geomfromtext('POLYGON((0 0, 3 0, 3 3, 0 3, 0 0))', 0))
  3    FROM dual;
SDE.ST_WITHIN(SDE.ST_GEOMFROMTEXT('LINESTRING(10,20)',0),SDE.ST_GEOMFROMTEXT('PO
--------------------------------------------------------------------------------
                                                                               0

 

Up until this point, I would argue the Esri documentation pertaining to spatial relations has been weak because it relies heavily on inference.  The attentive user might notice there are multiple Within overlay types in the drop-down box for the Select Layer By Location tool, and the inquisitive user might go one step further to read about overlay types to understand the differences between them.  The really attentive and knowledgeable user might understand that a single line in the What is the ST_Geometry storage type? documentation stating ST_Geometry implements the SQL 3 specification means the ST_Within function adheres to Clementini's definition instead of Esri's from the Select Layer By Location tool.  In short, there is no documentation that explicitly acknowledges there are different definitions of certain spatial relations within various parts of Esri's own software.

 

Confused yet?  Just wait, the fun really starts when we dive into the ArcPy Geometry Classes because the documentation goes from weak to really weak.  The only references to OGC in the ArcPy Geometry Classes documentation are for the WKB and WKT properties, and there are no references to DE-9IM or Clementini.  Let's take a look at the documentation for the within method:

arcgis_103_geometry_class_within_documentation.PNG

So, the geometry is within another geometry if it is within that geometry.  Got it.  Oh wait, are they asking me whether a geometry is within another geometry?  Although none of the illustrations captures a situation that differentiates the Esri and Clementini definitions, the lack of any reference to OGC, DE-9IM, or Clementini makes one think it is Esri's definition being used.  Let's check:

 

>>> line.within(polygon)
False

 

Ouch, that smarts.  It is pretty clear there is a lack of consistency with how certain spatial operators are implemented in various parts of Esri's software, but the worst part is the documentation doesn't even point it out.  Users are left to infer, and likely incorrectly at times, how the software works instead of being informed about how it works.  Caveat utilitor.