script for removing apend duplicats returing unexpected results

603
5
07-10-2023 07:05 AM
Laura_m_Conner
New Contributor III

I am cleaning up a curbed streets feature class. The fc was appended to as sections were completed. I modified the remove duplicate appends script. It ran successfully without error. The script is….

 

 

import arcpy

fc = r"N:\laura\edit_map4\edit_map4.gdb\StormCurbedStr_ExportFeature2"
feilds = ["FULLNAME","curb","SHAPE"]
keepList = list()

with arcpy.da.UpdateCursor(fc, feilds) as cursor:

    for row in cursor:
        row_val = row[0] + row[1] + str(row[2])
    
        if row_val not in keepList:
            keepList.append(row_val)

        elif row_val in keepList:
            cursor.deleteRow()
    
        else:
            pass
print("done")

 

 

The results are being looked over to verify it functioned as expected. Several features were deleted, as expected. The find identical tool was run on the resulting features to verify further. The same three fields were used as parameters and were in the exact order as in the original remove duplicate appends script. Also, the output duplicates-only box was checked. The find identical tool found 66 duplicates in pairs. I am investigating why the script did not find these. After checking both the FULLNAME and curb fields, they were identical in each of the 33 pairs. I.e., no leading or trailing spaces, misspellings, or even discrepant capitalization. I ruled these 2

 fields out. The select-by-location tool was used to check the geometry. Both the input feature and the selecting features were the curbed streets feature class, and the relationship was are identical to. All 66 features returned. To see how the script dealt with the shape parameter, I ran a script to see what it used for the shape parameter values. The script is…

 

 

fc = r"N:\laura\edit_map4\edit_map4.gdb\StormCurbedStr_invegstate_3"
feilds = ["FULLNAME","curb","SHAPE"]
 
with arcpy.da.UpdateCursor(fc, feilds) as cursor:

    for row in cursor:
        print(str(row[2]))

 

 

 

It returned some weird results. First, it gave me a list of coordinate points even though it is a line feature. I copied the output and pasted it into Excel to highlight the duplicates. No duplicates were flagged. It seems like the script pulled different points along these lines in each instance.

It boils down to 3 things.

Why is the script returning coordinates when it is a line feature?

How/what is the update cursor object using to satisfy the shape field?

How do I amend the remove duplicate appends script to get all the duplicates?

5 Replies
DuncanHornby
MVP Notable Contributor

If you study the help file on the searchcursor, look at the syntax section, all the inputs have a @ character, so if you want to return a geometry object you need to set the field name to be shape@, you are using shape.

I would suggest you explore the use of the Find Identical tool and Delete Identical tools to simplify your code, you don't need to be using cursors for this task.

Tags (1)
0 Kudos
Laura_m_Conner
New Contributor III

changing 

feilds = ["FULLNAME","curb","SHAPE"]

 to 

feilds = ["FULLNAME","curb","SHAPE@"]

 did not work. It deleted the both features in the duplicate pair.  Also it deleted several features that were not in a duplicate pair. 

The Delete Identical tool did remove the duplicates without  removing both features.  However I am trying to under stand why the my script acted as it did. the more I understand now the better I be at trouble shouting in the future. 

 

0 Kudos
by Anonymous User
Not applicable

You are pointed to a different dataset for the test part up there- is that why you are seeing different list of coordinates from the dataset that you are using in the first script?

 

Why is the script returning coordinates when it is a line feature?

Shape returns just a coordinate (details/theory below). You can see the JSON and WKT properties of the geometry object is just an array of points at its core.

JeffK_5-1689302658557.png

How/what is the update cursor object using to satisfy the shape field?

When you are using Shape, it returns a point calculated from something. I couldn't match the tuple coord to anything in my example that I tested on... Shape@ returns the complete described geometry object.

When you use the @, you are accessing the geometry object so when you cast it to string, you are getting the string value '<geoprocessing describe geometry object object at xXxx>'. Since memory is recycled, this could be causing false positives for duplicates and deleting the row because xXxxx is a place in memory and it will always contain the 'geoprocessing describe geometry object object at' part. For example, when looking at it in the debugger your row_val is:

JeffK_0-1689301387380.png

If you look at the Shape@ at row[2] in the debugger, you can see its object properties:

JeffK_4-1689301750164.png

Going back to using Shape, that row[2] tuple is nowhere in the SHAPE@ properties.

JeffK_3-1689301582911.png

It would be interesting to see where that coordinate plots to along the line...

How do I amend the remove duplicate appends script to get all the duplicates?

If you wanted to continue with the script, access the Shape@ geometry properties and such as firstPoint, lastPoint, trueCentroid and use those for comparison.

Laura_m_Conner
New Contributor III

thank you that was truly good information. I understand the script and the general way Esri deals with geometry. This answers most of my questions however... 

I tried amending the script  to 

feilds = ["FULLNAME","curb","SHAPE@trueCentroid"]

 The script ran and did not delete any false positives, however it  left  66 duplicates in pairs. likely the same ones as before. Still don't know why.

PS. how do you get to the window you took screen shots of for the debugging?

0 Kudos
by Anonymous User
Not applicable

Not sure if it's case sensitive, but the docs for the ston shows all caTRUECENTROID.  Might try that caps case and if it still doesn't, work as expected, I'd approach it differently using Shape@ and shpe_desc = arcpy.Desccribe(row[2]) to access the details.

The screenshots are from the debugger in Pycharm.  Not sure which IDE you are using, but there are tutorials out there for setting up stepping through code. Run a web search for <your ide> debugging python' and you'll get something to point you to the right direction.

0 Kudos