Find Duplicates - Accounting for Related Records

542
2
03-26-2019 09:08 AM
MicahBabinski
Regular Contributor

Greetings,

I am in need of a way to efficiently identify features/records that are duplicates within a feature class or table, while also accounting for any child records that might exist. Right now, I am checking for duplicates using Find Identical. I exclude fields that are OBJECTID, GLOBALID, or GUID, but otherwise evaluate all other fields for duplicates. We have datasets that participate in one or more relationship classes, sometimes as many as two relates deep. I'd like to avoid flagging parent features/records as duplicates if they have child records which differ from another otherwise-identical feature in the parent feature class.

This is part of a suite of QA/QC tools that I've created using, Python, arcpy, the geoprocessing toolbox, plus a couple custom bits (e.g. if the feature class is above a certain number of features I bring it into sqlite and check for duplicates there for performance), so any solution that leverages those capabilities would be preferable.

Thanks,

Micah

0 Kudos
2 Replies
RobertBorchert
Frequent Contributor III

Is it an ongoing thing or something you need to do to clean up now what you have?

Are you looking to find features where ALL attributes are duplicate or find those that have at least a single attribute duplicated.  Do all feature share the same geometry?

0 Kudos
MicahBabinski
Regular Contributor

Hey Robert, thanks for the reply. This would be part of a process that would be used in an ongoing fashion. It's part of a regular QA/QC process we run on our corporate data.

I'd like to flag features where all the geometry and attributes are the same except for:

  • OBJECTID
  • GLOBALID
  • GUIDs

But, if the features have related child records, I'd only like to flag those as duplicates if they have identical related records. We use either GLOBALID or GUID fields as the primary keys in relationship classes. Suppose the following Feature Class A relates to Table B in a one-to-many relationship:

Feature Class A

GeometryField 1Field 2Field 3GUID (PK)
SameAllTheSame{GUID1}
SameAllTheSame{GUID2}

Table B

Field 4Field 5Field 6GUID (FK)
SameValuesHere{GUID1}
AndSameHere{GUID1}
SameValuesHere{GUID2}
AndSameHere{GUID2}

So, in Feature Class A, the two features have the same geometry, and same attributes except for GUID. Each one has two related records in Table B - but they are the same set of related records. So in this case I would like to flag the two features as duplicates. But in this case:

Feature Class A

GeometryField 1Field 2Field 3GUID (PK)
SameAllTheSame{GUID1}
SameAllTheSame{GUID2}

Table B

Field 4Field 5Field 6GUID (FK)
SameValuesHere{GUID1}
AndSameHere{GUID1}
SameValuesHere{GUID2}
ButI'mDifferent!{GUID2}

The records related to the feature with GUID2 are different than those related to the feature with GUID1, so I would like to not flag the two features as duplicate.

Make sense? I might not be doing a good job explaining it. Basically I want a way to identify features with the same geometry, attributes, and identical related records, excluding any OBJECTID, GLOBALID, or GUID fields which would be unique by default.

Micah

0 Kudos