I'm trying to solve a distance problem where I need to get a set of distances between polygons and their next closest neighboring polygon. For example, for the blue polygons I need to know the distance between polygon #2 and #10, then #10 and #5 (a multipart polygon), then #5 to #19, #19 to #8, and so on.
Using Generate Near Table in ArcGIS, I can get the closest polygons to each other represented by the red lines, but there's gaps between clusters of polygons. Obviously, I create the whole set of pairwise distances between all the polygons, but then I'm stuck on how to subset that to the just the pairs needed to fill in the gaps (#10-5, #8-#16, #16-#13, #17-#3, and #11-#7.
Any ideas on how to solve this problem? I need to do this for a few thousand sets of polygons, so looking for an automated solution.
It's quite hard to follow your question, because your screenshot doesn't fit your text (polygons aren't labeled, colors are wrong)...
Are the blue lines what you want?
Whoops -- uploaded the wrong image. Yes, your interpretation is correct. Correct image below:
I don't understand what the #numbers are referring to in the image, where any blue polygons are in the image, or exactly what you need to achieve from your explanation so far.
Apologies - accidentally uploaded the wrong image. See the above reply.
Phew, turns out to be more complicated than I expected. Or there is some simple solution I overlooked...
fc = "TestPoints"
# get a near table with all pairs
near_table = arcpy.analysis.GenerateNearTable("TestPoints", "TestPoints", r"memory/NearTable", closest="ALL")
# extract the rank 1 pairs
rank_1 = [row
for row in arcpy.da.SearchCursor(near_table, ["IN_FID", "NEAR_FID"], "NEAR_RANK = 1")
]
# save these connections for later
feature_connections = list(rank_1)
# find the clusters of those pairs
clusters = []
while len(rank_1) > 0:
cluster = set() # new empty cluster
test_fids= set(rank_1[0]) # start with the first available fid pair
while len(test_fids) > 0:
cluster.update(test_fids) # add all test_fids to the cluster
# find the next test_fids
# all fid pairs where either fid is in the current_fids
# and not already in the cluster
next_fids = set()
for r in rank_1:
if r[0] in test_fids or r[1] in test_fids:
next_fids.update(set(r))
test_fids = next_fids.difference(cluster)
clusters.append(cluster)
# remove all fid pairs of this cluster from the fid pair list
rank_1 = [r
for r in rank_1
if r[0] not in cluster and r[1] not in cluster
]
print("Detected the following clusters:")
print(clusters)
# get the cluster connections
cluster_connections = []
for cluster in clusters:
start = set(cluster)
not_stop = set(cluster)
while True:
# find nearest point in another cluster
where = f"IN_FID IN {tuple(start)} AND NEAR_FID NOT IN {tuple(not_stop)}"
sql = (None, "ORDER BY NEAR_DIST")
with arcpy.da.SearchCursor(near_table, ["IN_FID", "NEAR_FID"], where, sql_clause=sql) as cursor:
for connection in cursor:
break
# if the reverse connection already exists, get the second nearest
# cluster, else you could get "super cluster" of clusters that are
# each other's closest clusters
rev_connection = tuple(reversed(connection))
if rev_connection in cluster_connections:
other_cluster = [c for c in clusters if connection[1] in c][0]
not_stop.update(other_cluster)
# if the connection does not yet exist, we're done
else:
cluster_connections.append(connection)
break
print("Detected the following connections between clusters:")
print(cluster_connections)
# put all the connections together
unique_feature_connections = []
for connection in feature_connections:
if tuple(reversed(connection)) not in unique_feature_connections:
unique_feature_connections.append(connection)
connections = unique_feature_connections + cluster_connections
print("\n\nDetected the following connections between the features:")
print(connections)
Input (points, but it should work with any geometry type):
Connections generated from a simple Generate Near Table:
And the result of my script with connections between the clusters:
There are some cluster connections that aren't really necessary, for example between points 161/166, or 156/163.
Sadly, you can't just connect to the closest cluster, because then you could get "super clusters" of clusters that are each other's closest clusters, for example the two clusters in the lower left. That's why I look for the second closest cluster if I detect a cluster connection multiple times. The "correct" way would be to check if the cluster is connected to the greater network of clusters and if so skip the detection of a new cluster connection, but I'm too lazy to figure out a proper solution for that...