What's Within:  When (Esri != Clementini) = ?

Blog Post created by bixb0012 Champion on May 14, 2015

This is the first in a two-part series on the risks to software users of poor documentation; specifically, the confusion and unexpected results that come from weakly documented spatial operators in GIS software.  The first part in the series looks at how inconsistent and incomplete documentation requires users to guess how spatial operators are implemented.  The second part in the series looks at the inconsistent results that arise from mixing different implementations of spatial operators.


Nine months since my last blog post, I can't say that was the pace I was aiming for when GeoNet got stood up (at least I don't give more weight to GeoNet resolutions than I do New Year's ones).  I don't know if my drought is busted, but a raw nerve has been hit hard enough by Esri to make it rain, at least for the moment.  Of all the soapboxes that litter my closets, poor documentation and its consequences is one of the most tattered.  As much as Honest Abe says you can't trust everything you read on the internet, I do think software users should be able to trust a company's online help/documentation.


One cannot spend too much time in the world of spatial relations without coming across the Dimensionally Extended 9 Intersection Model (DE-9IM).  The DE-9IM was developed by Clementini and others in the mid-'90s as an evolution of the 4 Intersection Model (4IM) and 9 Intersection Model (9IM).  Although the DE-9IM isn't the only definition of spatial relationships, it became the prevailing 2D definition after inclusion in the OpenGIS Implementation Specification for Geographic information - Simple feature access - Part 1: Common architecture.


References to and discussions of DE-9IM used to be found in various Esri documentation, but those references and documentation are becoming harder to find in current ArcGIS documentation.  For example, between the new 10.3 ArcGIS for Desktop, ArcGIS for Server, and ArcGIS for Developers sites, Clementini is only referenced on a couple handful of pages and DE-9IM is only referenced and discussed on one page:  Relational functions for ST_Geometry.  Whereas Python has a we're-all-grown-ups philosophy, Esri seems to be going the other direction and feeding us Pablum, without the vitamin fortification.


Although DE-9IM is the basis for 2D spatial predicates/relations in many geometry libraries and geospatial applications, there are two overlay types for the Select Layer By Location tool where Esri's default implementation differs from Clementini:  Contains, Within.  In both cases, the default or unqualified overlay type implies Esri's definition (blue underline) while the Clementini definition (red underline) is handled through qualifiers.


So how does the Esri definition of Contains and Within differ from the Clementini definition?  Forgoing mathematical notation and illustration matrices, the difference boils down to how boundaries of geometries are handled.  For example, geometry a that is entirely on the boundary of geometry b is considered within geometry b using Esri's definition but not within geometry b using Clementini's definition.  Let's create simple polygon and line features to demonstrate:


>>> polygon = arcpy.FromWKT('POLYGON((0 0, 3 0, 3 3, 0 3, 0 0))')
>>> line = arcpy.FromWKT('LINESTRING(1 0, 2 0)')
>>> arcpy.CopyFeatures_management(polygon, 'in_memory/polygon')
<Result 'in_memory\\polygon'>
>>> arcpy.CopyFeatures_management(line, 'in_memory/line')
<Result 'in_memory\\line'>




Executing the Select Layer By Location tool using both definitions of Within:


>>> arcpy.SelectLayerByLocation_management("line", "WITHIN", "polygon")
<Result 'line'>
>>> arcpy.GetCount_management("line")
<Result '1'>
>>> arcpy.SelectLayerByLocation_management("line", "WITHIN_CLEMENTINI", "polygon")
<Result 'line'>
>>> arcpy.GetCount_management("line")
<Result '0'>


So far, so good.  Beyond the fact that Esri's default definitions of Contains and Within differ from most other geometry libraries and geospatial applications, including the OGC simple feature standards, at least the results match the sparse documentation available online.


At this point, it is really important to point out something that is easily overlooked.  Esri's ST_Geometry functions are compliant with the OGC simple feature access and SQL standards, which means ST_Within adheres to the Clementini definition and not the Esri definition.


SQL> SELECT sde.st_within(sde.st_geomfromtext('LINESTRING(1 0, 2 0)', 0),
  2                       sde.st_geomfromtext('POLYGON((0 0, 3 0, 3 3, 0 3, 0 0))', 0))
  3    FROM dual;


Up until this point, I would argue the Esri documentation pertaining to spatial relations has been weak because it relies heavily on inference.  The attentive user might notice there are multiple Within overlay types in the drop-down box for the Select Layer By Location tool, and the inquisitive user might go one step further to read about overlay types to understand the differences between them.  The really attentive and knowledgeable user might understand that a single line in the What is the ST_Geometry storage type? documentation stating ST_Geometry implements the SQL 3 specification means the ST_Within function adheres to Clementini's definition instead of Esri's from the Select Layer By Location tool.  In short, there is no documentation that explicitly acknowledges there are different definitions of certain spatial relations within various parts of Esri's own software.


Confused yet?  Just wait, the fun really starts when we dive into the ArcPy Geometry Classes because the documentation goes from weak to really weak.  The only references to OGC in the ArcPy Geometry Classes documentation are for the WKB and WKT properties, and there are no references to DE-9IM or Clementini.  Let's take a look at the documentation for the within method:


So, the geometry is within another geometry if it is within that geometry.  Got it.  Oh wait, are they asking me whether a geometry is within another geometry?  Although none of the illustrations captures a situation that differentiates the Esri and Clementini definitions, the lack of any reference to OGC, DE-9IM, or Clementini makes one think it is Esri's definition being used.  Let's check:


>>> line.within(polygon)


Ouch, that smarts.  It is pretty clear there is a lack of consistency with how certain spatial operators are implemented in various parts of Esri's software, but the worst part is the documentation doesn't even point it out.  Users are left to infer, and likely incorrectly at times, how the software works instead of being informed about how it works.  Caveat utilitor.