I've been trying to figure out a way to de-duplicate a list of point geometries so that any coincidences in the list are removed and I end up with a list of just the unique locations (unique being if they are outside of 1' of another point on the list). It looks like I can union individual points to each other but that would only work if they were coincident, no? My initial idea was to construct a multipoint geometry object from the list of point geometries and union it to itself to remove duplicates but that doesn't seem to work. I think this problem must be deceptively simple and I'm just having tunnel vision as to what else to try.
To put it more simply, I need to detect coincident vertices in a polyline and return the location of anywhere that two or more points are coincident. I currently can do all of this except only return one location for each coincident area. Currently I am returning each instance of a coincident location (3 coincident points results in 3 returned coincident locations)
Solved! Go to Solution.
Ok, this should work (or at least be 90% there since I haven't run it to test)
import itertools
import arcpy
def return_duplicate_vertice_locations(polylist: list[arcpy.Polyline],
coincident_tolerance: int = 1) -> (list[arcpy.PointGeometry], list[arcpy.Polyline]):
"""
Takes a list of Polyline objects and returns a list of PointGeometries at locations where multiple polyline vertices
were detected as well as duplicate of input list with duplicates merged
:param polylist: list of polyline geometries
:param coincident_tolerance: distance that two points will be considered coincident. uses polyline spatial reference units. default of 1
:return: list of coincident vertice locations, polylist with cleaned up duplicate vertices
"""
coincident_locations = []
new_polylist = []
for polyline in polylist:
sr = polyline.spatialReference
deduped_points = []
segment_list = []
pairs = itertools.pairwise(polyline[0]) # generates list of point pairs e.g. (0, 1), (1, 2), (2, 3) etc.
for point_pair in pairs:
segment = arcpy.Polyline(arcpy.Array(point_pair), spatial_reference=sr)
if segment.length == 0.0:
# create a tiny dummy line smaller than the tolerance distance just so the geometry doesn't return as none
first_point = arcpy.PointGeometry(point_pair[0], spatial_reference=sr)
second_point = first_point.pointFromAngleAndDistance(0, coincident_tolerance - 0.1).firstPoint
new_pair = [point_pair[0], second_point]
segment = arcpy.Polyline(arcpy.Array(new_pair), spatial_reference=sr)
segment_list.append(segment)
for i, segment in enumerate(segment_list):
if segment.length <= coincident_tolerance:
location = arcpy.PointGeometry(segment.centroid, sr)
if not coincident_locations or not any(location.distanceTo(x) <= db.coincidence_tolerance for x in coincident_locations):
coincident_locations.append(location)
else:
if i == len(segment_list): # if last segment, append beginning and end points
deduped_points.extend(segment[0])
else: # otherwise append first point only
deduped_points.append(segment[0][0])
new_polylist.append(arcpy.Polyline(arcpy.Array(deduped_points), spatial_reference=sr))
return coincident_locations, new_polylist
Hi @Glasnoct
You can use the FindIdentical tool and use SHAPE as the input field.
Here is a workflow for deleting coincident points and having unique points as the output.
This takes a table or feature class. I'm working with a list of solely point geometries as I'm breaking apart polylines and comparing the vertices on a feature by feature basis.
Does the below meet your needs?
import arcpy
from collections import Counter
fc = r"path\tob\fc"
## get a list (set) of unique ids
unique_ids = {row[0] for row in arcpy.da.SearchCursor(fc, "OBJECTID")}
## use the unique ids to iterate over each feature
for unique_id in unique_ids:
## get all vertices (x,y)
all_pts = [row[0] for row in arcpy.da.SearchCursor(fc, "SHAPE@XY", f"OBJECTID = {unique_id}", explode_to_points=True)]
## count the amount of times an (x,y) is present (> 1 is a duplicate)
count_dict = Counter(all_pts)
## we only want the duplicates
duplicates = {key : value for key, value in count_dict.items() if value > 1}
## if duplictes found, print them to screen
if duplicates:
print(unique_id, duplicates)
We're currently migrating to the UN and are needing to do something similar to delete coincident vertices (or vertices that are so close to one another that the UN treats them as coincident), in order to avoid having tons of dirty areas through which we can't do tracing. We're investigating several possible solutions, one of which may include a Python script.
I haven't started working on it, but my initial thinking was an algorithm something like:
Obviously, there would be a lot of details to hammer out, but I think this logic could work. We currently have an analyst that is also testing an FME solution with a transformer called Generalizer Generalizer (safe.com). Not sure if that's available to you, but might be something to check out, if it is.
I won't be able to get back to this thread until Tuesday but you gave me a good idea on how I might solve this with the segmentation of the cable. I was being tripped up by segments generated from coincident points as the length would be 0 and thus the shape is invalid or something (i cannot call any methods of the geometry object). I got around this by generating a slightly offset temporary point from the first point using pointFromAndleAndDistance and then constructing a polyline object from those two points. I'll come back to this next week and flesh it out but the order of operations is roughly:
for polyline:
segments = create_segments_from_polyline_func
for s in segments:
if s.length <= coincident_tolerance # currently 1' for my needs
get centroid of segment for inserting a new feature at that location (creating a note feature for client)
insertCursor on note FC with shape equal to the centroid
If you wanted to reconstruct the geometry without the duplicate vertices:
if segment length > coincidence_tolerance:
if segment is last segment in list of segments, append first and last point to list
otherwise append first point to list
generate polyline object from list of appended segment points, run update cursor on entry
Ok, this should work (or at least be 90% there since I haven't run it to test)
import itertools
import arcpy
def return_duplicate_vertice_locations(polylist: list[arcpy.Polyline],
coincident_tolerance: int = 1) -> (list[arcpy.PointGeometry], list[arcpy.Polyline]):
"""
Takes a list of Polyline objects and returns a list of PointGeometries at locations where multiple polyline vertices
were detected as well as duplicate of input list with duplicates merged
:param polylist: list of polyline geometries
:param coincident_tolerance: distance that two points will be considered coincident. uses polyline spatial reference units. default of 1
:return: list of coincident vertice locations, polylist with cleaned up duplicate vertices
"""
coincident_locations = []
new_polylist = []
for polyline in polylist:
sr = polyline.spatialReference
deduped_points = []
segment_list = []
pairs = itertools.pairwise(polyline[0]) # generates list of point pairs e.g. (0, 1), (1, 2), (2, 3) etc.
for point_pair in pairs:
segment = arcpy.Polyline(arcpy.Array(point_pair), spatial_reference=sr)
if segment.length == 0.0:
# create a tiny dummy line smaller than the tolerance distance just so the geometry doesn't return as none
first_point = arcpy.PointGeometry(point_pair[0], spatial_reference=sr)
second_point = first_point.pointFromAngleAndDistance(0, coincident_tolerance - 0.1).firstPoint
new_pair = [point_pair[0], second_point]
segment = arcpy.Polyline(arcpy.Array(new_pair), spatial_reference=sr)
segment_list.append(segment)
for i, segment in enumerate(segment_list):
if segment.length <= coincident_tolerance:
location = arcpy.PointGeometry(segment.centroid, sr)
if not coincident_locations or not any(location.distanceTo(x) <= db.coincidence_tolerance for x in coincident_locations):
coincident_locations.append(location)
else:
if i == len(segment_list): # if last segment, append beginning and end points
deduped_points.extend(segment[0])
else: # otherwise append first point only
deduped_points.append(segment[0][0])
new_polylist.append(arcpy.Polyline(arcpy.Array(deduped_points), spatial_reference=sr))
return coincident_locations, new_polylist
Try this.
import arcpy
# Set the input point feature class
input_fc = r"your_points" # Replace with your point feature class
# Spatial reference, adjust based on your data
spatial_ref = arcpy.Describe(input_fc).spatialReference
# Create lists to store unique points and coincident OBJECTIDs
unique_points = []
coincident_objectids = []
# Create a search cursor to iterate over points and track their OBJECTIDs
with arcpy.da.SearchCursor(input_fc, ["OBJECTID", "SHAPE@XY"]) as search_cursor:
for row in search_cursor:
objectid = row[0]
point_geom = arcpy.PointGeometry(arcpy.Point(row[1][0], row[1][1]), spatial_ref)
is_unique = True
# Check if the point is within 1 foot of any existing unique points
for unique_point in unique_points:
if point_geom.distanceTo(unique_point[1]) <= 1.0: # 1 foot tolerance
is_unique = False
coincident_objectids.append((unique_point[0], objectid)) # Track the coincident OBJECTIDs
break
if is_unique:
unique_points.append((objectid, point_geom))
# Print out all the duplicats/coincident OBJECTIDs
if coincident_objectids:
print("OBJECTIDs of points that are duplicats/coincident (within 1 foot):")
for pair in coincident_objectids:
print(f"OBJECTID 1: {pair[0]}, OBJECTID 2: {pair[1]}")
else:
print("No duplicates/coincident points found.")
Assuming your duplicates are all sequential, you could use something like this:
import arcpy
def remove_duplicate_points(polyline: arcpy.Polyline, *, tolerance: float = 0.1) -> arcpy.Polyline:
previous_point = polyline[0][0]
points = [previous_point]
for point in polyline[0]:
if (abs(point.X - previous_point.X) > tolerance) and (abs(point.Y - previous_point.Y) > tolerance):
points.append(point)
previous_point = point
return arcpy.Polyline(arcpy.Array(points), spatial_reference=polyline.spatialReference)
Here's a more verbose version with some tests:
import arcpy
import random
import timeit
from functools import reduce
def remove_duplicate_points(polyline: arcpy.Polyline, *, tolerance: float = 0.1) -> arcpy.Polyline:
"""Remove duplicate points from a polyline.
Args:
polyline: A polyline object. or a list of polyline objects.
tolerance: The distance between points (in feature units) to consider them duplicates. Default is 0.1.
Returns:
A polyline object with duplicate points removed. or a list of polyline objects with duplicate points removed.
"""
if polyline.isMultipart:
# Recursively call remove_duplicate_points on each part of the polyline
# Then union the parts to prevent segments being added between parts
return reduce(
function=lambda acc, pl: acc.union(pl),
sequence=[remove_duplicate_points(part) for part in polyline]
)
# First point
previous_point = polyline[0][0]
# Unique points list
points = [previous_point]
for point in polyline[0]:
# Only append the point to the points list if it is further than the tolerance from the previous point
if (abs(point.X - previous_point.X) > tolerance) and (abs(point.Y - previous_point.Y) > tolerance):
points.append(point)
previous_point = point
# Return a new polyline object with the unique points
return arcpy.Polyline(arcpy.Array(points), spatial_reference=polyline.spatialReference)
def generate_randomized_dupe_polyline() -> arcpy.Polyline:
"""Generate a polyline with duplicate points.
Returns:
A polyline object with duplicate points.
"""
# Generate a random number of points
r_points = [arcpy.Point(random.randint(0, 10), random.randint(0, 10)) for _ in range(random.randint(3, 100))]
points = []
for p in r_points:
# Add the point to the points list
points.append(p)
# Add a duplicate point to the points list
points.append(p)
return arcpy.Polyline(arcpy.Array(points))
def main():
runs = 1000
duration = timeit.timeit(lambda: remove_duplicate_points(generate_randomized_dupe_polyline()), number=runs)
avg_duration = duration / runs
print(f"Average time to remove duplicate points from a polyline: {avg_duration:0.5f} seconds per polyline")
if __name__ == "__main__":
main()
This one runs in about 1ms per polyline. But the optimization is that it only sees the last unique point so if a point is duplicated later in the line it will still add it as the previous point is outside the culling tolerance.