Cluster Analysis

BarryYoung · ‎08-04-2023

Hi, I'm working with a point dataset of road defects and have been asked to identify duplicate reported defects so that we only have a dataset with unique active defects.

The points aren't always in the exact same location so I've used Cluster Analysis to identify points that are within 5m of each other, which has worked fine.

The next step is to look at each of the clusters and identify if any of the defects have been repaired within that cluster. I have a field that identifies this simply as Yes/No with a repaired date in a separate field. However, if there are any defects that have a reported date that is after a defect that has been repaired then we need to keep the most recent record.

The final goal is to retain (or identify) a single point within each cluster where there is still an active defect.

Each defect record has the following fields to utilise:

FID (Unique Feature ID)

CLUSTER_ID (Each cluster has its own ID and all the individual points within the cluster are allocated this ID)

Date Created (Date Reported)

Completion Date (Date defect repaired)

Closed (Yes/No if case closed or not)

If anyone has done something similar or has suggestions on how to achieve this I'd be hugely grateful!

Thanks, Barry