I have a vector layer containing thoudsands of polygons. Some of these polygons are (highly) overlapped with one another.
I have two purposes:
identify polygons that are highly overlapped (spatially overlapped and areas are quite similar/ the same) ;
group these highly overlapped polygons (assign each group a common value in a column to indicate such group relationship.
How may I implement these in ArcMap/ Arcpy/ geopandas? Any hints are welcome!
I am assuming non-geometric polygonal shapes (eg not all squares, circles, etc).
First, give
Find Identical (Data Management)—ArcGIS Pro | Documentation
a try on the Shape field, using a the xy_tolerance set to some reasonable acceptable distance for variation in the vertices potential differences. This may work in some cases.
If that doesn't, try simple tests like symbolizin your data on the shape_area, then shape_length. If a spatial pattern appears, then you can move on to a refinement step.
If the simple test doesn't yield any pattern, add a column to you table and calculate shape_area/shape_length
this will give you another shape parameter to guage geometry similarity beyond just area or perimeter alone. Does it reveal any spatial pattern?
Refinement, is to denote location. Export your featureclass using
Feature To Point (Data Management)—ArcGIS Pro | Documentation
to produce a point featureclass of the geometry centroids.
Most people don't have the spatial statistics toolset which would be useful for clustering analysis to provide the next step in refinement
An overview of the Mapping Clusters toolset—ArcGIS Pro | Documentation
so you can use
Aggregate Points (Cartography)—ArcGIS Pro | Documentation
to form polygons from any apparent centroid clusters..
If there is a spatial pattern, then you can use those clusters in a spatial join to investigate what the underlying original polygons are like. If each cluster contained similar/identical polygons, then you can use that for a your final classification.
With tools:
Example (yellow polygons labeled with yellow OBJECTIDs, black outlined Polygons labeled with black OBJECTIDs, groups not symbolized but labeled with red OBJECTIDs. Polygons without group don't have enough overlap.)
You can do the same with arcpy, but the field mappings are a hassle. It's easier using Cursors:
fc_1 = "TestPolygons"
fc_2 = "TestPolygons_1"
out_folder = "memory"
out_name = "Groups"
# read fcs as {oid: geometry}
geo_1 = dict([row for row in arcpy.da.SearchCursor(fc_1, ["OID@", "SHAPE@"])])
geo_2 = dict([row for row in arcpy.da.SearchCursor(fc_2, ["OID@", "SHAPE@"])])
# intersect the fcs and read the result as [ [ geometry, fid_1, fid_2 ] ]
intersection = arcpy.analysis.Intersect([fc_1, fc_2], "memory/intersect", "ONLY_FID")
fid_fields = [f.name for f in arcpy.ListFields(intersection) if f.type=="Integer"]
intersect = [row for row in arcpy.da.SearchCursor(intersection, ["SHAPE@"] + fid_fields)]
# create output
out_table = arcpy.management.CreateFeatureclass(out_folder, out_name, "POLYGON", spatial_reference=fc_1)
for fid in fid_fields:
arcpy.management.AddField(out_table, fid, "LONG")
with arcpy.da.InsertCursor(out_table, ["SHAPE@"] + fid_fields) as cursor:
for shp, fid_1, fid_2 in intersect:
# get the corresponding original polygons
shp_1 = geo_1[fid_1]
shp_2 = geo_2[fid_2]
# if they are sufficiently same-ish, write the fids into the output table
if (shp.area / shp_1.area > 0.9) and (shp.area / shp_2.area > 0.9):
cursor.insertRow([shp, fid_1, fid_2])
Hi Johannes, thanks for your reply. But in my case, these thoudsands of polygons are all in one layer rather than two layers. Any further suggestion?
Ah OK. You have to do just a little bit more.
fc = "TestPolygons"
out_folder = "memory"
out_name = "Groups"
# read fc as {oid: geometry}
geo = dict([row for row in arcpy.da.SearchCursor(fc, ["OID@", "SHAPE@"])])
# intersect the fc with itself and read the result as [ [ geometry, fid_1, fid_2 ] ]
intersection = arcpy.analysis.Intersect([fc, fc], "memory/intersect", "ONLY_FID")
fid_fields = [f.name for f in arcpy.ListFields(intersection) if f.type=="Integer"]
intersect = [row for row in arcpy.da.SearchCursor(intersection, ["SHAPE@"] + fid_fields)]
# create output
out_table = arcpy.management.CreateFeatureclass(out_folder, out_name, "POLYGON", spatial_reference=fc)
arcpy.management.AddField(out_table, "FID_1", "LONG")
arcpy.management.AddField(out_table, "FID_2", "LONG")
with arcpy.da.InsertCursor(out_table, ["SHAPE@", "FID_1", "FID_2"]) as cursor:
for shp, fid_1, fid_2 in intersect:
# skip if fids are the same
if fid_1 == fid_2:
continue
# get the corresponding original polygons
shp_1 = geo[fid_1]
shp_2 = geo[fid_2]
# if they are sufficiently same-ish, write the fids into the output table
if (shp.area / shp_1.area > 0.9) and (shp.area / shp_2.area > 0.9):
cursor.insertRow([shp, fid_1, fid_2])
arcpy.management.DeleteIdentical(out_table, "Shape")
Hi Johannes,
Great point! We adopted your strategy and did make some progress. But we also found out that some identical polygons (polygons with exactly the same shape and area overlapping together) somehow did not get grouped at all.
For instance, the three polygons with highlighted yellow color in pic 1 (南山区-2021-0588号, 南山区-2020-0268号,南山区-2019-0885号) overlapped with each other 100%. They additionally overlapped with some smaller polygons (see pic 2). However, the tool "intersect" only returned intersected results of these three polygons with those smaller polygons, rather than additionally returning intersected results among the three polygons themselves. Any hints to solve this? Thanks in advance!
Pic 1: the three polygons overlapping each other 100% but not grouped into a family by the intersect tool
Pic 2: Take polygon 南山区-2019-0885号 for instance, the intersect result only returned its intersected areas with small polygons rather than returning its intersected results with 南山区-2021-0588号 and 南山区-2020-0268号 as well.