Select to view content in your preferred language

Script for Find Identical tool

1380
4
Jump to solution
12-14-2022 03:07 AM
Labels (1)
CFrost
by
Occasional Contributor

I want to write a script that will find duplicate polygons based on shape, defined attribute, and location.  I think that the Find Identical tool will provide what I need, but I do not have the license to use it.  

Does anyone have any suggestions?

 

Thanks

0 Kudos
1 Solution

Accepted Solutions
CFrost
by
Occasional Contributor

Hi, thanks for this, much appreciated!  Ive amended the above code to look at shape_length and shape_area as a test but am receiving the following error message

Traceback (most recent call last):
File "<string>", line 21, in <module>
RuntimeError: Request canceled

 

My amended code is as follows: 

# input parameters
in_table = "temp_layer"
out_table_folder = "memory"
out_table_name = "identical"
fields = ["Shape_length", "SHAPE_Area"]
only_duplicate = False


# create output table
arcpy.env.overwriteOutput = True 
out_table = arcpy.management.CreateTable(out_table_folder, out_table_name)
arcpy.management.AddField(out_table, "IN_FID", "LONG")
arcpy.management.AddField(out_table, "FEAT_SEQ", "LONG")

# read and group in_table
groups = dict()
for i, f in enumerate(fields):
    if f == "Shape":
        fields[i] = "SHAPE@WKT"
with arcpy.da.SearchCursor(in_table, ["OID@"] + fields) as cursor:
    for row in cursor:
        oid = row[0]
        key = tuple(row[1:])
        try:
            groups[key].append(oid)
        except KeyError:
            groups[key] = [oid]

# write groups into out_table
with arcpy.da.InsertCursor(out_table, ["IN_FID", "FEAT_SEQ"]) as cursor:
    for seq, key in enumerate(groups.keys()):
        oids = groups[key]
        if only_duplicate and len(oids) < 2:
            continue
        for oid in oids:
            cursor.insertRow([oid, seq])

 

Do you have any suggestions?  Also, I would like to compare string fields in addition to integer fields - will this code work regardless of the field type?  Thanks! 

View solution in original post

0 Kudos
4 Replies
JohannesLindner
MVP Alum
# input parameters
in_table = "path_or_layer_name"
out_table_folder = "memory"
out_table_name = "identical"
fields = ["IntegerField", "Shape"]
only_duplicate = False


# create output table
out_table = arcpy.management.CreateTable(out_table_folder, out_table_name)
arcpy.management.AddField(out_table, "IN_FID", "LONG")
arcpy.management.AddField(out_table, "FEAT_SEQ", "LONG")

# read and group in_table
groups = dict()
for i, f in enumerate(fields):
    if f == "Shape":
        fields[i] = "SHAPE@WKT"
with arcpy.da.SearchCursor(in_table, ["OID@"] + fields) as cursor:
    for row in cursor:
        oid = row[0]
        key = tuple(row[1:])
        try:
            groups[key].append(oid)
        except KeyError:
            groups[key] = [oid]

# write groups into out_table
with arcpy.da.InsertCursor(out_table, ["IN_FID", "FEAT_SEQ"]) as cursor:
    for seq, key in enumerate(groups.keys()):
        oids = groups[key]
        if only_duplicate and len(oids) < 2:
            continue
        for oid in oids:
            cursor.insertRow([oid, seq])

Have a great day!
Johannes
0 Kudos
CFrost
by
Occasional Contributor

Hi, thanks for this, much appreciated!  Ive amended the above code to look at shape_length and shape_area as a test but am receiving the following error message

Traceback (most recent call last):
File "<string>", line 21, in <module>
RuntimeError: Request canceled

 

My amended code is as follows: 

# input parameters
in_table = "temp_layer"
out_table_folder = "memory"
out_table_name = "identical"
fields = ["Shape_length", "SHAPE_Area"]
only_duplicate = False


# create output table
arcpy.env.overwriteOutput = True 
out_table = arcpy.management.CreateTable(out_table_folder, out_table_name)
arcpy.management.AddField(out_table, "IN_FID", "LONG")
arcpy.management.AddField(out_table, "FEAT_SEQ", "LONG")

# read and group in_table
groups = dict()
for i, f in enumerate(fields):
    if f == "Shape":
        fields[i] = "SHAPE@WKT"
with arcpy.da.SearchCursor(in_table, ["OID@"] + fields) as cursor:
    for row in cursor:
        oid = row[0]
        key = tuple(row[1:])
        try:
            groups[key].append(oid)
        except KeyError:
            groups[key] = [oid]

# write groups into out_table
with arcpy.da.InsertCursor(out_table, ["IN_FID", "FEAT_SEQ"]) as cursor:
    for seq, key in enumerate(groups.keys()):
        oids = groups[key]
        if only_duplicate and len(oids) < 2:
            continue
        for oid in oids:
            cursor.insertRow([oid, seq])

 

Do you have any suggestions?  Also, I would like to compare string fields in addition to integer fields - will this code work regardless of the field type?  Thanks! 

0 Kudos
JohannesLindner
MVP Alum

Hmm... "Request canceled" is an error I have never seen before, and I can't find it online, either. There is always the tried and true "restart ArcGIS", but other than that I have no idea.

While checking Shape_Length and Shape_Area should work ( I could do it when testing), maybe start with a simple test table.

Yes, you should be able to find identical Strings, Integers, Doubles (excluding rounding errors), Dates, and Geometries.

 

Good catch with overwriteOutput. That defaults to True in my setup, so I didn't think about that.


Have a great day!
Johannes
CFrost
by
Occasional Contributor

Thanks for the response, knowing that its an error you aren't familiar with is helpful to know - Ill try the usual things as you suggest.  Thanks for the help, really appreciate it! 

Update - managed to resolve the error (not sure how).  The output looks like this:

OBJECTID IN_FID FEAT_SEQ
1 1 0
2 2 1
3 3 2
4 4 3

 

I know that OID 1 and OID 4 are completely identical (test dataset to see how the output would work), but its not clear to me how the output is showing this - would you mind explaining?  Thanks!

 

Update:  Ive managed to resolve the issues of the output by changing 'only_duplicate= true'  This generates an output similar to the Overlapping Features tool where it lists the feature IDs that are duplicated.  Oddly, when I changed it back to 'false' the Feat_seq field shows duplicate values where the feature is duplicated so looks like it works.

 

Thanks for this solution - very useful! 

 

0 Kudos