select duplicate records in FGDB

3186
6
Jump to solution
10-06-2017 09:16 AM
JustinWolff
Occasional Contributor

I understand I cannot use a SQL query to select multiple records that have duplicate values within a FGDB (only personal or SDE).  Does anyone have a workaround?  My goal is to select features that have duplicate attribute values in a particular field and merge them together to create multipart polylines...then delete the original, separate features and replace them with the new, multipart features.  (I have to do this for a point feature class as well, but I know that will take additional effort to create a multipoint feature class and integration with single point feature class is difficult.) 

Right now I'm stuck on go just trying to select the duplicate records.

Thanks

Justin

0 Kudos
1 Solution

Accepted Solutions
JoshuaBixby
MVP Esteemed Contributor

I think this is more a Python‌ question.

Have you looked into using Find Identical—Help | ArcGIS Desktop ?  Does it not work for your situation?

View solution in original post

6 Replies
JoshuaBixby
MVP Esteemed Contributor

I think this is more a Python‌ question.

Have you looked into using Find Identical—Help | ArcGIS Desktop ?  Does it not work for your situation?

JustinWolff
Occasional Contributor

Yes, thanks - i meant to put this in Python, not the API

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Although the reason for asking the question is being addressed in a different thread, merge features with duplicate attributes in FGDB , I thought I would share there is a fairly simple way to select distinct, duplicate, triplicate, etc... records in a layer without having to create an additional table.

The following code snippet will select non-distinct records in a layer/table view based on comparison/selection fields.

def select_plicates_layer(layer, selection_fields, value=2, condition='ge'):
    import operator as operator
    from collections import defaultdict
    
    if isinstance(selection_fields, basestring):
        selection_fields = [selection_fields]
    
    plicates = defaultdict(list)
    with arcpy.da.SearchCursor(layer, ["OID@"] + selection_fields) as cursor:
        for row in cursor:
            plicates[row[1:]] +=  row[:1]
    
    condition_oids = [oids 
                      for oids 
                      in plicates.values()
                      if getattr(operator, condition)(len(oids), value)]
    if condition_oids:
        oid_list = [oid for oids in condition_oids for oid in oids]
        layer.setSelectionSet("NEW", oid_list)
    
    return condition_oids

The real value of the function is less what it selects and the results it returns, i.e., the function returns a list of OID lists.  Each OID list contains the OIDs of records that share the same values for the selection_fields.  The list of lists can be used to determine how many duplicates, triplicates, quadruplicates, etc... are in the layer and what the makeup of them looks like.

The code is designed to operate on layers, so any existing selection of a layer is honored.  The code operators more in a "SUBSET_SELECTION" manner than a "NEW_SELECTION" manner.

JoeBorgione
MVP Emeritus

I've used this to find duplicates:  Find duplicate field values in ArcGIS using Python | Don Meltz 

That should just about do it....
JustinWolff
Occasional Contributor

Thanks Joe.  I've wanted to avoid adding a new field, and just use the selected, duplicates in a geoprocessing action.  But, a new field may be what it takes.

0 Kudos
Ipad1Avion
New Contributor II

Also, exporting to Excel works well to highlight duplicates on a non-advanced license.