Select to view content in your preferred language

my python script to check for duplicates is sometimes removing data.

249
10
Wednesday
Laura_m_Conner
Regular Contributor

My script is sometimes removing data from my map. I have developed a script to check a key field in several feature classes for duplicates. The script does work sometimes, but other times, it throws an error, error 000732. This mainly happens to 2 feature classes, one in which duplicates are found, and the other is the one that needs to test 2 different fields.  Before running the script, I verified the layer was in the TOC and on the map. However, it was gone when I checked the TOC after the error.   The strange thing is it can run the find identical in the loop fine without error, but it throws the error when it comes to the add join function. This also happens with the layer in which I need to check two fields. The find identical for the first field runs fine, but the find identical on the 2nd field can return the error. Any insights would be appreciated.

my code is:

F=("Sanitary Sewer Manholes","Water Network Structures","Water Fittings","Water System Valves","Water Service Connections","Water Hydrants","Sewer Gravity Mains","Sewer Pressurized Mains","Storm Drain Pipes", "Water Lateral Lines", "Water Mains")
b =0 

for e in F:
    a=F[b]
    s=str(a).replace(" ", "_")
    o= "C:\\Users\\lconner\\Documents\\ArcGIS\\Projects\\edit_map6\\edit_map6.gdb\\" +s+"_findidencal"+str(9)
    m="C:\\Users\\lconner\\Documents\\ArcGIS\\Projects\\edit_map6\\edit_map6.gdb\\MBID_duplicate" + str(1)
   
    arcpy.management.FindIdentical(
    in_dataset= a,
    out_dataset=o,
    fields="FACILITYID",
    xy_tolerance=None,
    z_tolerance=0,
    output_record_option="ONLY_DUPLICATES")

    count_result = arcpy.management.GetCount(o)
    count = int(count_result[0])
    
    
    if count > 0:
        print("Duplicate entries in "+str (a) )
        print(f"Number of duplicates found: {count}")
        arcpy.management.AddJoin(
            in_layer_or_view=a,
            in_field="OBJECTID",
            join_table=o,
            join_field="IN_FID",
            join_type="KEEP_ALL",
            index_join_fields="NO_INDEX_JOIN_FIELDS",
            rebuild_index="NO_REBUILD_INDEX",
            join_operation="JOIN_ONE_TO_FIRST"
        )


        arcpy.management.SelectLayerByAttribute(
            in_layer_or_view= a,
            selection_type="NEW_SELECTION",
            where_clause="IN_FID IS NOT NULL",
            invert_where_clause=None
            )
    

    else:
        print("no Duplicate in " + str(a) )

    
    b=b+1


arcpy.management.SelectLayerByAttribute(
    in_layer_or_view="Water Service Connections",
    selection_type="NEW_SELECTION",
    where_clause="MBID IS NOT NULL",
    invert_where_clause=None
)

arcpy.management.FindIdentical(
    in_dataset="Water Service Connections",
    out_dataset=m,
    fields="MBID",
    xy_tolerance=None,
    z_tolerance=0,
    output_record_option="ONLY_DUPLICATES"
)

if count > 0:
        print("Duplicate MBID in Meters " )
        print(f"Number of duplicates found: {count}")
        arcpy.management.AddJoin(
            in_layer_or_view="Water Service Connections",
            in_field="OBJECTID",
            join_table=m,
            join_field="IN_FID",
            join_type="KEEP_ALL",
            index_join_fields="NO_INDEX_JOIN_FIELDS",
            rebuild_index="NO_REBUILD_INDEX",
            join_operation="JOIN_ONE_TO_FIRST"
        )


        arcpy.management.SelectLayerByAttribute(
            in_layer_or_view= "Water Service Connections",
            selection_type="NEW_SELECTION",
            where_clause="IN_FID IS NOT NULL",
            invert_where_clause=None
            )

else:
        print("no Duplicate MBIDs ")

 

Pro 3.5.1

data source: enterprise geodata base 

0 Kudos
10 Replies
TonyAlmeida
MVP Regular Contributor

I think one of your issues is with the AddJoin and SelectLayerByAttribute, it can create the appears of deleted features. 

join_type="KEEP_ALL" keeps all records from the left table (a) and includes only matching rows from the join table (o) where OBJECTID = IN_FID.

Filtering with "IN_FID IS NOT NULL" isolates features identified as duplicates.

Unmatched features aren’t deleted they're just unselected or excluded.

 

Always work with feature layers not feature classes, and clear the selection after your done. 

clear join after SelectLayerByAttribute

 

HaydenWelch
MVP Regular Contributor

I'm noticing that you have a primary loop that loops over features followed by the same thing flattened. Is that intentional?

You're also not changing the name of the out `MBID_duplicate` layer. it's always evaluating to "...MBID_duplicate1". It can also help to use verbose variable names. Here's a modified version of your code that tries to simplify the main loop so you can identify your error:

import arcpy
from pathlib import Path
from functools import partial

features=(
    "Sanitary Sewer Manholes",
    "Water Network Structures",
    "Water Fittings",
    "Water System Valves",
    "Water Service Connections",
    "Water Hydrants",
    "Sewer Gravity Mains",
    "Sewer Pressurized Mains",
    "Storm Drain Pipes",
    "Water Lateral Lines",
    "Water Mains",
)

gdb = Path(r"C:\Users\lconner\Documents\ArcGIS\Projects\edit_map6\edit_map6.gdb")


p_find = partial(
    arcpy.management.FindIdentical(
    fields="FACILITYID",
    z_tolerance=0,
    output_record_option="ONLY_DUPLICATES",
    )
)

p_join = partial(
    arcpy.management.AddJoin(
            in_field="OBJECTID",
            join_field="IN_FID",
            join_type="KEEP_ALL",
            index_join_fields="NO_INDEX_JOIN_FIELDS",
            rebuild_index="NO_REBUILD_INDEX",
            join_operation="JOIN_ONE_TO_FIRST",
    )
)

p_select = partial(
    arcpy.management.SelectLayerByAttribute(
        selection_type="NEW_SELECTION",
        where_clause="IN_FID IS NOT NULL",
    )
)

for feature in features:    
    feature=feature.replace(" ", "_")
    
    # This was originally writing all mbid_duplicate tables to the
    # same features `MBID_duplicate1`
    out_feature = gdb / f"FindIdentical_{feature}"
    mbid_feature = gdb / f"MBID_duplicate_{feature}"
    
    # Originally you were counting the input features, not the identical table
    identicals, _ = p_find(in_dataset=feature, out_dataset=out_feature)
    count = sum(1 for _ in arcpy.da.SearchCursor(identicals, ['OID@']))
    
    # Continue the loop if no identicals found
    if count == 0:
        print(f"No duplicate in {feature}" )
        continue
    
    # Log identical counts and add join, then select the features that don't have a null IN_FID
    print(f"{count} duplicate entries in {feature}")
    p_join(in_layer_or_view=feature, join_table=out_feature)
    p_select(in_layer_or_view=feature)

 

You can ignore or remove the `partial` blocks, I just did that to pre-fill the constant arguments in the tool calls and bubble them up to the top of the script in case you want to change them later.

0 Kudos
Laura_m_Conner
Regular Contributor

@ TonyAlmeida, The problem is not that features are disappearing from the feature layer; the problem is that the feature layer is being removed from my map in the middle of running the script. I know that when I open the attribute table, I just see the selected duplicates. Would  the add join or  SelectLayerByAttribute cause the feature layer to be removed from the map?

@HaydenWelch, Yes, the loop and then rechecking the one file are intentional. All feature layers need the FID checked for duplicates. However, only the meters have the MBID field, and thus, they are the only feature layer where that needs to be checked, too.

As of now, the names of the outputs are not causing a problem. I thought that might be the issue.  I added the + str(1) on the end as a crude way to get a unique output name, as I could change the number in the (). It would be best to generate a unique output name every time. It would be appreciated if you could point me to some article or explanation on how to do that, but I don't think that is causing the problem.

This is the first I have heard of partial functions, so I will need to read up on them before I use them. Do you have any good resources for that? Would the absence of the partial function cause it to remove a feature layer from the map in the middle of a script? Right now, I am focused on finding what is causing the problem. I am not sure how simplifying the loop will help with that.

Here is the printout and error message I am getting:

the script returns:

no Duplicate in Sanitary Sewer Manholes
no Duplicate in Water Network Structures
no Duplicate in Water Fittings
no Duplicate in Water System Valves
no Duplicate in Water Service Connections
no Duplicate in Water Hydrants
no Duplicate in Sewer Gravity Mains
no Duplicate in Sewer Pressurized Mains
no Duplicate in Storm Drain Pipes
Duplicate entries in Water Lateral Lines
Number of duplicates found: 6

The error in full i am getting is: 

ExecuteError                              Traceback (most recent call last)
Cell In[5], line 25
     23     print("Duplicate entries in "+str (a) )
     24     print(f"Number of duplicates found: {count}")
---> 25     arcpy.management.AddJoin(
     26         in_layer_or_view=a,
     27         in_field="OBJECTID",
     28         join_table=o,
     29         join_field="IN_FID",
     30         join_type="KEEP_ALL",
     31         index_join_fields="NO_INDEX_JOIN_FIELDS",
     32         rebuild_index="NO_REBUILD_INDEX",
     33         join_operation="JOIN_ONE_TO_FIRST"
     34     )
     37     arcpy.management.SelectLayerByAttribute(
     38         in_layer_or_view= a,
     39         selection_type="NEW_SELECTION",
     40         where_clause="IN_FID IS NOT NULL",
     41         invert_where_clause=None
     42         )
     45 else:

File C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\management.py:13600, in AddJoin(in_layer_or_view, in_field, join_table, join_field, join_type, index_join_fields, rebuild_index, join_operation)
  13598     return retval
  13599 except Exception as e:
> 13600     raise e

File C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\management.py:13582, in AddJoin(in_layer_or_view, in_field, join_table, join_field, join_type, index_join_fields, rebuild_index, join_operation)
  13578 from arcpy.arcobjects.arcobjectconversion import convertArcObjectToPythonObject
  13580 try:
  13581     retval = convertArcObjectToPythonObject(
> 13582         gp.AddJoin_management(
  13583             *gp_fixargs(
  13584                 (
  13585                     in_layer_or_view,
  13586                     in_field,
  13587                     join_table,
  13588                     join_field,
  13589                     join_type,
  13590                     index_join_fields,
  13591                     rebuild_index,
  13592                     join_operation,
  13593                 ),
  13594                 True,
  13595             )
  13596         )
  13597     )
  13598     return retval
  13599 except Exception as e:

File C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\geoprocessing\_base.py:533, in Geoprocessor.__getattr__.<locals>.<lambda>(*args)
    531 val = getattr(self._gp, attr)
    532 if callable(val):
--> 533     return lambda *args: val(*gp_fixargs(args, True))
    534 else:
    535     return convertArcObjectToPythonObject(val)

ExecuteError: Failed to execute. Parameters are not valid.
ERROR 000732: Layer Name or Table View: Dataset Water Lateral Lines does not exist or is not supported
Failed to execute (AddJoin).

P.S. is there any thing i could have added to my original post to help clarify it?

Thanks

Laura

 

 

 

HaydenWelch
MVP Regular Contributor

Simplifying the loop just makes it easier to see exactly where the code is failing. I believe your issue is still in the fact that you are looping over the features and essentially doing nothing. Each loop overwrites the output of the previous. I fixed that and commented it in my version. When you go to join, you're joining the last table in the loop.

 

You are also not using the updated name (something I also fixed in my version) and are calling the space delimited version of the feature name which seems to not exist.

 

I only did the partials and the loop simplification to make it clear that your core logic was incorrect. Sometimes cleaning things up like that can make the issue super obvious.

0 Kudos
TonyAlmeida
MVP Regular Contributor

I also have some scripts that remove the layer for what ever reason, annoying ESRI bug is my guess. the issue is the Water Later lines, AddJoin requires the input to be a layer in the current map, not just a feature class name. So create feature layres, something like this, 

F = ("Sanitary Sewer Manholes", "Water Network Structures", "Water Fittings", 
     "Water System Valves", "Water Service Connections", "Water Hydrants", 
     "Sewer Gravity Mains", "Sewer Pressurized Mains", "Storm Drain Pipes", 
     "Water Lateral Lines", "Water Mains")

for a in F:  # No need for counter 'b', just iterate directly
    s = str(a).replace(" ", "_")
    o = "C:\\Users\\lconner\\Documents\\ArcGIS\\Projects\\edit_map6\\edit_map6.gdb\\" + s + "_findidencal" + str(9)
    
    # First create a feature layer from the feature class
    layer_name = a + "_layer"
    arcpy.MakeFeatureLayer_management(a, layer_name)
    
    arcpy.management.FindIdentical(
        in_dataset=a,
        out_dataset=o,
        fields="FACILITYID",
        xy_tolerance=None,
        z_tolerance=0,
        output_record_option="ONLY_DUPLICATES")

 

This how I would write the code, untested but hopefully it helps.

F = ("Sanitary Sewer Manholes", "Water Network Structures", "Water Fittings", 
     "Water System Valves", "Water Service Connections", "Water Hydrants", 
     "Sewer Gravity Mains", "Sewer Pressurized Mains", "Storm Drain Pipes", 
     "Water Lateral Lines", "Water Mains")

for a in F:  # No need for counter 'b', just iterate directly
    s = str(a).replace(" ", "_")
    o = "C:\\Users\\lconner\\Documents\\ArcGIS\\Projects\\edit_map6\\edit_map6.gdb\\" + s + "_findidencal" + str(9)
    
    # First create a feature layer from the feature class
    layer_name = a + "_layer"
    arcpy.MakeFeatureLayer_management(a, layer_name)
    
    arcpy.management.FindIdentical(
        in_dataset=a,
        out_dataset=o,
        fields="FACILITYID",
        xy_tolerance=None,
        z_tolerance=0,
        output_record_option="ONLY_DUPLICATES")

    count_result = arcpy.management.GetCount(o)
    count = int(count_result[0])
    
    if count > 0:
        print("Duplicate entries in " + str(a))
        print(f"Number of duplicates found: {count}")
        
        # Use the layer instead of the feature class name
        arcpy.management.AddJoin(
            in_layer_or_view=layer_name,
            in_field="OBJECTID",
            join_table=o,
            join_field="IN_FID",
            join_type="KEEP_ALL",
            index_join_fields="NO_INDEX_JOIN_FIELDS",
            rebuild_index="NO_REBUILD_INDEX",
            join_operation="JOIN_ONE_TO_FIRST"
        )

        arcpy.management.SelectLayerByAttribute(
            in_layer_or_view=layer_name,
            selection_type="NEW_SELECTION",
            where_clause="IN_FID IS NOT NULL",
            invert_where_clause=None
        )
    else:
        print("no Duplicate in " + str(a))

# For the MBID check, you should similarly create a layer first
arcpy.MakeFeatureLayer_management("Water Service Connections", "Water_Service_Connections_layer")

m = "C:\\Users\\lconner\\Documents\\ArcGIS\\Projects\\edit_map6\\edit_map6.gdb\\MBID_duplicate" + str(1)

arcpy.management.SelectLayerByAttribute(
    in_layer_or_view="Water_Service_Connections_layer",
    selection_type="NEW_SELECTION",
    where_clause="MBID IS NOT NULL",
    invert_where_clause=None
)

arcpy.management.FindIdentical(
    in_dataset="Water_Service_Connections_layer",
    out_dataset=m,
    fields="MBID",
    xy_tolerance=None,
    z_tolerance=0,
    output_record_option="ONLY_DUPLICATES"
)

count_result = arcpy.management.GetCount(m)  # You need to get the count here
count = int(count_result[0])

if count > 0:
    print("Duplicate MBID in Meters")
    print(f"Number of duplicates found: {count}")
    arcpy.management.AddJoin(
        in_layer_or_view="Water_Service_Connections_layer",
        in_field="OBJECTID",
        join_table=m,
        join_field="IN_FID",
        join_type="KEEP_ALL",
        index_join_fields="NO_INDEX_JOIN_FIELDS",
        rebuild_index="NO_REBUILD_INDEX",
        join_operation="JOIN_ONE_TO_FIRST"
    )

    arcpy.management.SelectLayerByAttribute(
        in_layer_or_view="Water_Service_Connections_layer",
        selection_type="NEW_SELECTION",
        where_clause="IN_FID IS NOT NULL",
        invert_where_clause=None
    )
else:
    print("no Duplicate MBIDs")

 

Laura_m_Conner
Regular Contributor

I am taking that there is no way to mitigate the problem of the script removing the layer. Is there some where I can give the details to Esri so they can work on a fix for the problem.

0 Kudos
HaydenWelch
MVP Regular Contributor

No, you're just using a constant string value for the output layer and so its being overwritten. If you want the layer to persist, you need the name to change for each feature. I Did that in the first few lines of my simplified loop by adding the feature name to the end of the layer name.

 

to clarify, `str(1)` is not unique. It's the equvalent of just putting a "1" in there. You need that value to change.

0 Kudos
HaydenWelch
MVP Regular Contributor

Okay, I think I was initially confused by your question. I now see that you're trying to check for duplicate IDs in all feature classes. I'm gonna revise my initial response with one that actually tries to solve your problem.

 

I did manage to get it all done using SearchCursors and Python's built in mapping. All you need to do is create a dictionary of FACILITYID keys with a sequence of something that contains the name of the feature class and the OBJECTID for the duplicate feature (for selection later).

After filtering out the facility records that only have one hit, you can then re-map those duplicates to a new mapping of Feature Class to Object IDs (since the original check was against all Feature Classes). From there we can print out the feature classes that have no duplicates as well as the count of duplicates in each feature class that does contain collisions:

from pathlib import Path
from arcpy.da import SearchCursor
from arcpy.mp import Layer
from arcpy.management import (
    SelectLayerByAttribute,
    CopyFeatures,
)

feature_classes=(
    "Sanitary Sewer Manholes",
    "Water Network Structures",
    "Water Fittings",
    "Water System Valves",
    "Water Service Connections",
    "Water Hydrants",
    "Sewer Gravity Mains",
    "Sewer Pressurized Mains",
    "Storm Drain Pipes",
    "Water Lateral Lines",
    "Water Mains",
)

gdb = Path(r"C:\Users\lconner\Documents\ArcGIS\Projects\edit_map6\edit_map6.gdb")

# Initialize a dictionary that will contain duplicate entries
# in the form: {<facility>: [(<feature_class>, <oid>), ...]}
facility_mapping: dict[int|str, list[tuple[str, int]]] = {}

# Build the mapping of facilities
for feature_class in feature_classes:
    # Sanitize name
    feature_class = feature_class.replace(' ', '_')
    
    # The `feature_class` in this call might need to be interpolated to see your features, 
    # Possibly with the gdb global above if that's where they live
    for facility_id, object_id in SearchCursor(feature_class, ['FACILITYID', 'OID@']):
        facility_mapping.setdefault(facility_id, [])
        facility_mapping[facility_id].append((feature_class, object_id))

# Filter the mapping to only have facilities with duplicates
facility_mapping = {
    facility: duplicates
    for facility, duplicates in facility_mapping.items()
    if len(duplicates) > 1
}

# Clear current selection
SelectLayerByAttribute(selection_type='CLEAR_SELECTION')

# Build a selection map for selecting duplicates in all FCs
selections: dict[str, list[int]] = {}
for _, record in facility_mapping.items():
    # Unpack the record
    fc, oid = record
    
    # Add the OID to the selection mapping
    selections.setdefault(fc, [])
    selections[fc].append(oid)

# Final loop for printing and selecting
for fc in feature_classes:
    # Make sure this points to your features!
    fc = fc.replace(' ', '_')
    if fc not in selections:
        print(f"No duplicates found in {fc.replace('_', ' ')}")
        continue
    
    print(f"Found {len(selections[fc])} duplicates in {fc.replace('_', ' ')}")
    SelectLayerByAttribute(
        in_layer_or_view=fc, 
        selection_type='ADD_TO_SELECTION', 
        where_clause=f"OBJECTID IN ({','.join(selections[fc])})",
    )
    
    # Uncomment to create a new layer with duplicates
    # CopyFeatures(fc, f"{fc}_duplicates")
Laura_m_Conner
Regular Contributor

ok, I have a few questions. 

1. would this approach keep arc pro from sometimes removing  Water Lateral Lines from the project while in the middle of executing the script.

2 Is there a way or approach that generally stops  arc pro form arbitrarily removing data from the project in the middle of running the script?

3. Is there a reason you are not using the built in find identical tool instead of using long hand to specify the operation?

I am trying tying to see if i can get your script to work for me. this is my 1st time with the Search Cursor function so it might take some time to figure it out.

0 Kudos