select duplicate records in FGDB

JustinWolff · ‎10-06-2017

I understand I cannot use a SQL query to select multiple records that have duplicate values within a FGDB (only personal or SDE). Does anyone have a workaround? My goal is to select features that have duplicate attribute values in a particular field and merge them together to create multipart polylines...then delete the original, separate features and replace them with the new, multipart features. (I have to do this for a point feature class as well, but I know that will take additional effort to create a multipoint feature class and integration with single point feature class is difficult.)

Right now I'm stuck on go just trying to select the duplicate records.

Thanks

Justin

JoshuaBixby · ‎10-06-2017

I think this is more a Python‌ question.

Have you looked into using Find Identical—Help | ArcGIS Desktop ? Does it not work for your situation?

View solution in original post

JoshuaBixby · ‎10-06-2017

I think this is more a Python‌ question.

Have you looked into using Find Identical—Help | ArcGIS Desktop ? Does it not work for your situation?

JustinWolff · ‎10-06-2017

Yes, thanks - i meant to put this in Python, not the API

JoshuaBixby · ‎10-08-2017

Although the reason for asking the question is being addressed in a different thread, merge features with duplicate attributes in FGDB , I thought I would share there is a fairly simple way to select distinct, duplicate, triplicate, etc... records in a layer without having to create an additional table.

The following code snippet will select non-distinct records in a layer/table view based on comparison/selection fields.

def select_plicates_layer(layer, selection_fields, value=2, condition='ge'):
    import operator as operator
    from collections import defaultdict
    
    if isinstance(selection_fields, basestring):
        selection_fields = [selection_fields]
    
    plicates = defaultdict(list)
    with arcpy.da.SearchCursor(layer, ["OID@"] + selection_fields) as cursor:
        for row in cursor:
            plicates[row[1:]] +=  row[:1]
    
    condition_oids = [oids 
                      for oids 
                      in plicates.values()
                      if getattr(operator, condition)(len(oids), value)]
    if condition_oids:
        oid_list = [oid for oids in condition_oids for oid in oids]
        layer.setSelectionSet("NEW", oid_list)
    
    return condition_oids‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

The real value of the function is less what it selects and the results it returns, i.e., the function returns a list of OID lists. Each OID list contains the OIDs of records that share the same values for the selection_fields. The list of lists can be used to determine how many duplicates, triplicates, quadruplicates, etc... are in the layer and what the makeup of them looks like.

The code is designed to operate on layers, so any existing selection of a layer is honored. The code operators more in a "SUBSET_SELECTION" manner than a "NEW_SELECTION" manner.

JoeBorgione · ‎10-06-2017

I've used this to find duplicates: Find duplicate field values in ArcGIS using Python | Don Meltz

That should just about do it....

JustinWolff · ‎10-06-2017

Thanks Joe. I've wanted to avoid adding a new field, and just use the selected, duplicates in a geoprocessing action. But, a new field may be what it takes.

Ipad1Avion · ‎08-08-2019

Also, exporting to Excel works well to highlight duplicates on a non-advanced license.