How to identify and group polygons highly overlapped with one another within the same vector layer?

BeichenTian · ‎04-15-2023

I have a vector layer containing thoudsands of polygons. Some of these polygons are (highly) overlapped with one another.

I have two purposes:

identify polygons that are highly overlapped (spatially overlapped and areas are quite similar/ the same) ;
group these highly overlapped polygons (assign each group a common value in a column to indicate such group relationship.
How may I implement these in ArcMap/ Arcpy/ geopandas? Any hints are welcome!

DanPatterson · ‎04-15-2023

I am assuming non-geometric polygonal shapes (eg not all squares, circles, etc).

First, give

Find Identical (Data Management)—ArcGIS Pro | Documentation

a try on the Shape field, using a the xy_tolerance set to some reasonable acceptable distance for variation in the vertices potential differences. This may work in some cases.

If that doesn't, try simple tests like symbolizin your data on the shape_area, then shape_length. If a spatial pattern appears, then you can move on to a refinement step.

If the simple test doesn't yield any pattern, add a column to you table and calculate shape_area/shape_length

this will give you another shape parameter to guage geometry similarity beyond just area or perimeter alone. Does it reveal any spatial pattern?

Refinement, is to denote location. Export your featureclass using

Feature To Point (Data Management)—ArcGIS Pro | Documentation

to produce a point featureclass of the geometry centroids.

Most people don't have the spatial statistics toolset which would be useful for clustering analysis to provide the next step in refinement

An overview of the Mapping Clusters toolset—ArcGIS Pro | Documentation

so you can use

Aggregate Points (Cartography)—ArcGIS Pro | Documentation

to form polygons from any apparent centroid clusters..

If there is a spatial pattern, then you can use those clusters in a spatial join to investigate what the underlying original polygons are like. If each cluster contained similar/identical polygons, then you can use that for a your final classification.

... sort of retired...

JohannesLindner · ‎04-15-2023

With tools:

Intersect the both feature classes
use Join Field to copy the !Shape_Area! fields from the fcs into the intersection
use Calculate Field to calculate how much the intersection covers the respective features
1. Percentage_1 = Area_Intersection / Area_FC1 * 100
2. Percentage_2 = Area_Intersection / Area_FC2 * 100
Select Layer By Attribute to select those features where both percentages are above a certain threshold, eg 90%
export those features. The OBJECTID is the GroupID, the two FID fields are the OBJECTIDs of the features that are part of the group. You can use Join Field to get the GroupID Into the original feature classes

Example (yellow polygons labeled with yellow OBJECTIDs, black outlined Polygons labeled with black OBJECTIDs, groups not symbolized but labeled with red OBJECTIDs. Polygons without group don't have enough overlap.)

You can do the same with arcpy, but the field mappings are a hassle. It's easier using Cursors:

fc_1 = "TestPolygons"
fc_2 = "TestPolygons_1"
out_folder = "memory"
out_name = "Groups"

# read fcs as {oid: geometry}
geo_1 = dict([row for row in arcpy.da.SearchCursor(fc_1, ["OID@", "SHAPE@"])])
geo_2 = dict([row for row in arcpy.da.SearchCursor(fc_2, ["OID@", "SHAPE@"])])

# intersect the fcs and read the result as [ [ geometry, fid_1, fid_2 ] ]
intersection = arcpy.analysis.Intersect([fc_1, fc_2], "memory/intersect", "ONLY_FID")
fid_fields = [f.name for f in arcpy.ListFields(intersection) if f.type=="Integer"]
intersect = [row for row in arcpy.da.SearchCursor(intersection, ["SHAPE@"] + fid_fields)]

# create output
out_table = arcpy.management.CreateFeatureclass(out_folder, out_name, "POLYGON", spatial_reference=fc_1)
for fid in fid_fields:
    arcpy.management.AddField(out_table, fid, "LONG")
    
with arcpy.da.InsertCursor(out_table, ["SHAPE@"] + fid_fields) as cursor:
    for shp, fid_1, fid_2 in intersect:
        # get the corresponding original polygons
        shp_1 = geo_1[fid_1]
        shp_2 = geo_2[fid_2]
        # if they are sufficiently same-ish, write the fids into the output table
        if (shp.area / shp_1.area > 0.9) and (shp.area / shp_2.area > 0.9):
            cursor.insertRow([shp, fid_1, fid_2])

Have a great day!
Johannes

BeichenTian · ‎04-15-2023

Hi Johannes, thanks for your reply. But in my case, these thoudsands of polygons are all in one layer rather than two layers. Any further suggestion?

JohannesLindner · ‎04-16-2023

Ah OK. You have to do just a little bit more.

Intersect the fc with itself
delete all features of the intersection where FID1 = FID2
use Join Field to copy the Shape_Area field into the intersection. Do this twice (join with FID1 and with FID2).
use Calculate Field to calculate how much the intersection covers the respective features
1. Percentage_1 = Area_Intersection / Area_1 * 100
2. Percentage_2 = Area_Intersection / Area_2 * 100
Select Layer By Attribute to select those features where both percentages are above a certain threshold, eg 90%
export those features. The OBJECTID is the GroupID, the two FID fields are the OBJECTIDs of the features that are part of the group. You can use Join Field to get the GroupID Into the original feature class
use Delete Identical on the exported fc with the Shape field

fc = "TestPolygons"
out_folder = "memory"
out_name = "Groups"

# read fc as {oid: geometry}
geo = dict([row for row in arcpy.da.SearchCursor(fc, ["OID@", "SHAPE@"])])

# intersect the fc with itself and read the result as [ [ geometry, fid_1, fid_2 ] ]
intersection = arcpy.analysis.Intersect([fc, fc], "memory/intersect", "ONLY_FID")
fid_fields = [f.name for f in arcpy.ListFields(intersection) if f.type=="Integer"]
intersect = [row for row in arcpy.da.SearchCursor(intersection, ["SHAPE@"] + fid_fields)]

# create output
out_table = arcpy.management.CreateFeatureclass(out_folder, out_name, "POLYGON", spatial_reference=fc)
arcpy.management.AddField(out_table, "FID_1", "LONG")
arcpy.management.AddField(out_table, "FID_2", "LONG")

    
with arcpy.da.InsertCursor(out_table, ["SHAPE@", "FID_1", "FID_2"]) as cursor:
    for shp, fid_1, fid_2 in intersect:
        # skip if fids are the same
        if fid_1 == fid_2:
            continue
        # get the corresponding original polygons
        shp_1 = geo[fid_1]
        shp_2 = geo[fid_2]
        # if they are sufficiently same-ish, write the fids into the output table
        if (shp.area / shp_1.area > 0.9) and (shp.area / shp_2.area > 0.9):
            cursor.insertRow([shp, fid_1, fid_2])
arcpy.management.DeleteIdentical(out_table, "Shape")

Have a great day!
Johannes

BeichenTian · ‎04-23-2023

Hi Johannes,

Great point! We adopted your strategy and did make some progress. But we also found out that some identical polygons (polygons with exactly the same shape and area overlapping together) somehow did not get grouped at all.

For instance, the three polygons with highlighted yellow color in pic 1 (南山区-2021-0588号, 南山区-2020-0268号,南山区-2019-0885号) overlapped with each other 100%. They additionally overlapped with some smaller polygons (see pic 2). However, the tool "intersect" only returned intersected results of these three polygons with those smaller polygons, rather than additionally returning intersected results among the three polygons themselves. Any hints to solve this? Thanks in advance!

Pic 1: the three polygons overlapping each other 100% but not grouped into a family by the intersect tool

Pic 2: Take polygon 南山区-2019-0885号 for instance, the intersect result only returned its intersected areas with small polygons rather than returning its intersected results with 南山区-2021-0588号 and 南山区-2020-0268号 as well.