In the Hole lies outside shell and arcpy thread, the OP had geometries that were invalid but Esri tools were failing to detect them as such. After some investigation, I discovered a shared-origin edge case where the Check Geometry will fail to detect an invalid geometry.
The following is a simplified example of the OP's geometries from the Hole lies outside shell and arcpy thread.
In the example above, polygon a is valid while polygon b is invalid because of "hole" lies outside of the bounding geometry. When run through Check Geometry, it will identify polygon b as invalid. When the two parts are combined into a two-part multi-polygon, the Check Geometry tools fails to identify the polygon as invalid even though one of the two parts is invalid.
The following code creates a feature class containing three polygons (a, b, and ab), and then checks the geometry of all three:
import arcpy
SR = arcpy.SpatialReference(4326)
polys = {
"a":("MULTIPOLYGON(((10 5, 10 10, 0 10, 0 0, 10 0, 10 5)))"),
"b":("MULTIPOLYGON(((10 5, 15 0, 20 5, 15 10, 10 5),(10 5, 7.5 2.5, 5 5, 7.5 7.5, 10 5)))"),
"ab":("MULTIPOLYGON(((10 5, 10 10, 0 10, 0 0, 10 0, 10 5)),"
"((10 5, 15 0, 20 5, 15 10, 10 5),(10 5, 7.5 2.5, 5 5, 7.5 7.5, 10 5)))")
}
fc = arcpy.CopyFeatures_management(
[arcpy.FromWKT(polys[p], SR) for p in polys],
arcpy.CreateScratchName(workspace="memory")
)
chkGeom = arcpy.CheckGeometry_management(fc, arcpy.CreateScratchName(workspace="memory"))
print(*arcpy.da.SearchCursor(chkGeom, ["FEATURE_ID","PROBLEM"]), sep="\n")
The result is that just one of the three polygons is identified as invalid:
>>> print(*arcpy.da.SearchCursor(chkGeom, ["FEATURE_ID","PROBLEM"]), sep="\n")
(2, 'incorrect ring ordering')
>>>
Since the third polygon is a combined polygon, it should also be identified as invalid.
Nice...
So polygon 'a' is counterclockwise, so shouldn't it be invalid as well?
polygon 'b' s first ring is counterclockwise and 'b's second ring is clockwise.
also the areas are a clue [-100.0, -50.0, 12.5] (a, b1, b2)
'a's area is negative
'b's counterclockwise ring (b1) is larger and negative than the clockwise ring (b2)
'b's ring extents don't overlap either
Aren't they all wrong or is there some implicit 'correction' made to the coordinates during the 'arc'ing.
In the early versions of the OpenGIS Simple feature access standard, no polygon ring order was specified. Whether that was an oversight, or an accommodation of the fact that different geometry models use different ring orders, I do not know. Starting with the 1.2 version in 2006, the standard specifically identified a ring order for polygons:
The exterior boundary LinearRing defines the “top” of the surface which is the side of the surface from which the exterior boundary appears to traverse the boundary in a counter clockwise direction. The interior LinearRings will have the opposite orientation, and appear as clockwise when viewed from the “top”,
The underlying library that ArcGIS uses for constructing geometry from WKT supports both ring orders. For canonical/normal WKT using counter-clockwise ring order, the constructor rebuilds the ring order to Esri's clockwise ring order to create a valid ArcGIS polygon.
>>> import arcpy
>>> pg_ccw = arcpy.FromWKT('POLYGON((0 0, 10 0, 10 10, 0 10, 0 0))')
>>> pg_cw = arcpy.FromWKT('POLYGON((0 0, 0 10, 10 10, 10 0, 0 0))')
>>>
>>> pg_ccw.area
100.0
>>> pg_cw.area
100.0
>>>
>>> pg_ccw.JSON
'{"rings":[[[0,0],[0,10],[10,10],[10,0],[0,0]]],"spatialReference":{"wkid":null}}'
>>> pg_cw.JSON
'{"rings":[[[0,0],[0,10],[10,10],[10,0],[0,0]]],"spatialReference":{"wkid":null}}'
>>>
ArcGIS Pro does the opposite when outputting WKT, i.e., it changes the ring order to the canonical/normal WKT.
>>> pg_ccw.WKT
'MULTIPOLYGON (((0 0, 10 0, 10 10, 0 10, 0 0)))'
>>> pg_cw.WKT
'MULTIPOLYGON (((0 0, 10 0, 10 10, 0 10, 0 0)))'
>>>
Note that the WKT geometry format stores Open Geospatial Consortium (OGC) geometry types.
The ArcGIS Polygon geometry type is defined as a collection of rings. Exterior rings are clockwise, and holes are counterclockwise, there can be no self-intersections. In a most basic way, a valid Esri polygon must have such structure that the interior of the polygon can always be unambiguously determined to be to the right of every segment. This definition is also enough to represent any polygonal structure that OGC MULTIPOLYGON can represent.
OGC standard definition is more complicated than Esri's. It has a MULTIPOLYGON type that is a collection of POLYGON types, and a POLYGON type consists of one exterior ring and zero or more holes. The OGC POLYGON interior must be topologically path-connected, that is it should be possible to trace a continuous path between any two interior points of a single OGC POLYGON without touching the boundary.
When WKT definition in question is converted to Esri polygon, it becomes an Esri polygon that has three rings. These three rings have proper orientation and have no self-intersections, the resulting Esri polygon is simple from the point of view of Esri features. Therefore check geometry will not flag it.
There are other cases when OGC standard is more restrictive than Esri simple polygon. For example, cases when a ring is self-tangent are not allowed by OGC standard, but are perfectly valid for Esri features.
To address the differences there is the OGC option in the Repair Geometry and Check Geometry tools available in ArcGIS Pro. This option was introduced specifically to allow validation and repair of features so that they can be exported into valid OGC shapes. The tool will ensure that the Esri polygon rings are sorted such that it is easy to separate them into OGC POLYGON groups, and also it ensures that the whole structure, when converted to OGC MULTIPOLYGON, is OGC simple.
Sergey Tolstov, thanks for the reply and additional information about differences between the OGC and Esri geometry models.
Even taking into consideration this new information, I believe there is still a defect in the Check Geometry tool. Leaving OGC and WKT behind and only looking at the polygons from the Esri geometry model, the Check Geometry tool provides different results when polygons a and b have a shared vertex versus when they don't. Using the following code to generate five polygons (a, a2, b, ab, a2b):
wkid = {"wkid":4326}
rings = {
"a" : [[[10,5],[10,0],[0,0],[0,10],[10,10],[10,5]]],
"a2" : [[[10,10],[10,0],[0,0],[0,10],[10,10]]],
"b" : [[[10,5],[15,10],[20,5],[15,0],[10,5]],[[10,5],[7.5,7.5],[5,5],[7.5,2.5],[10,5]]]
}
polys = ["a"], ["a2"], ["b"], ["a","b"], ["a2", "b"]
polys_json = [
{"rings": sum([rings[p] for p in poly], []), "spatialReference": wkid}
for poly
in polys
]
fc = arcpy.CopyFeatures_management(
[arcpy.AsShape(json, True) for json in polys_json],
arcpy.CreateScratchName(workspace="memory")
)
Check geometry identifies 2 of the 5 polygons as invalid instead of 3 of the 5 polygons:
>>> chkGeom = arcpy.CheckGeometry_management(fc, arcpy.CreateScratchName(workspace="memory"))
>>> print(*arcpy.da.SearchCursor(chkGeom, ["FEATURE_ID","PROBLEM"]), sep="\n")
(3, 'incorrect ring ordering')
(5, 'self intersections')
>>>
If the results of Check Geometry are correct, can you explain why combining a valid and invalid polygon into a multi-part polygon is valid if they share a vertex but invalid if they don't share a vertex?
For an Esri polygon to be simple, all intersections have to occur at vertices. In general, it follows from "a valid Esri polygon must have such structure that the interior of the polygon can always be unambiguously determined to be to the right of every segment", that:
When you throw in the three rings together, the case when the un-rotated square ring has no vertex at the common point of the diamond shapes is a violation of "segments can touch other segments only at the end points". This is also not allowed by the OGC standard.
Sergey Tolstov, thanks for the follow-up reply.
The additional information you provided is quite helpful to understanding the underlying Esri geometry model, but I am having a hard time finding where Esri's geometry model is documented. I am aware of the Geometry objects—Common Data Types | ArcGIS for Developers documentation, but it seems incomplete, or at least abridged and more focused on the semantics of writing a geometry in JSON format than describing the underlying geometry model. Ironically, the 20+ year old Shapefile Technical Description document (https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf) contains more specifics about Esri geometries than the "current" geometry objects documentation online. I also found some additional information in Java API Polygon on GitHub, but it seems odd to bury a geometry model in a Java API on GitHub.
Getting back to the specific example, I still don't understand why an invalid single-part polygon can be part of a valid multi-part polygon. I understand that without the shared-vertex the new multi-part polygon creates a new violation that didn't exist before, i.e., without a shared vertex the two single-part polygons violate the "segments can touch other segments only at the end points" rule. However, even with a shared vertex, one of the two polygons that makes up the new multi-part polygon is invalid. Isn't an Esri polygon invalid if one of the parts is invalid by itself?
> Isn't an Esri polygon invalid if one of the parts is invalid by itself?
What is a part in the Esri polygon if the Esri polygon is a collection of rings?
If you would make a separate polygon from each ring, the polygons that were made from the holes would be invalid, because they would have incorrect orientation.
Sergey Tolstov and Kory Kramer, any thoughts on my question about where the Esri geometry model is documented?
Not sure if this is what is going on, but there are 2 rules identified in the check/repair geometry
1 remove sequential duplicate points
2 close open polygons
If I take two of your polygon shapes, one CW and one CCW ring, you end up with the sequence of potential steps
Line 23, the array points ends up being a multipart shape, 2 shapes with the same number of points.
a1 = np.array([[10., 5.], [15., 10.], [20., 5.], [15., 0.], [10., 5.]])
ccw = np.array([[10., 5.], [7.5, 7.5], [5., 5.], [7.5, 2.5], [10., 5.]])
tmp = np.asarray([a1, ccw])
z_3 = np.vstack(tmp)
dif = np.diff(z_3, axis=0)
z_3b = z_3[np.nonzero(dif)].reshape(-1, 2)
z_3b = np.concatenate((z_3b, [z_3[-1]]))
a1 # ----- clockwise ring (line 1)
array([[10., 5.],
[15., 10.],
[20., 5.],
[15., 0.],
[10., 5.]])
ccw # ----- counterclockwise ring (line 2)
array([[10. , 5. ],
[ 7.5, 7.5],
[ 5. , 5. ],
[ 7.5, 2.5],
[10. , 5. ]])
tmp # ---- combined (line 3), shape (2, 5, 2)
array([[[10. , 5. ],
[15. , 10. ],
[20. , 5. ],
[15. , 0. ],
[10. , 5. ]],
[[10. , 5. ],
[ 7.5, 7.5],
[ 5. , 5. ],
[ 7.5, 2.5],
[10. , 5. ]]])
z_3 # ---- stacked (line 4), shape (10, 2)
array([[10. , 5. ],
[15. , 10. ],
[20. , 5. ],
[15. , 0. ],
[10. , 5. ], # ---- rule...
[10. , 5. ], # remove duplicate sequential points
[ 7.5, 7.5],
[ 5. , 5. ],
[ 7.5, 2.5],
[10. , 5. ]])
z_3b # ---- lines 5-7
array([[10. , 5. ],
[15. , 10. ],
[20. , 5. ],
[15. , 0. ],
[10. , 5. ],
[ 7.5, 7.5],
[ 5. , 5. ],
[ 7.5, 2.5], # ---- rule...
[10. , 5. ]]) # close sequential polygons
Now the interesting thing is that that sequence yields a 'mobius loop' kind of shape (z_3b) since the common point (10, 5) is duplicated, for both parts, but removed by the second rule. This might account for the seemingly hole outside, just being a continuation of the outer ring, closing back in on itself on line 57
I created 2 versions of the above sequence of points. One a singlepart featureclass and the other a multipart featureclass by using Dissolve. I then converted both versions out to arrays, using FeatureClassToNumPyArray.
Both contain the same points but the order of the ring inclusion is altered. Note that the duplicate [10, 5] in the middle two points of the shape are NOT removed. They shouldn't be sinceone is needed to close the first ring and the other to start the second ring.
This raises two questions
# ---- point sequence of singlepart version
ar1
array([[10. , 5. ],
[15. , 10. ],
[20. , 5. ],
[15. , 0. ],
[10. , 5. ],
[10. , 5. ],
[ 7.5, 2.5],
[ 5. , 5. ],
[ 7.5, 7.5],
[10. , 5. ]])
# ---- point sequence of dissolved version
ar2
array([[10. , 5. ],
[ 7.5, 2.5],
[ 5. , 5. ],
[ 7.5, 7.5],
[10. , 5. ],
[10. , 5. ],
[15. , 10. ],
[20. , 5. ],
[15. , 0. ],
[10. , 5. ]])
Interesting to say the least.