How to indentify duplicate or unique value in Pro RuntimeError: Error 999999

JaredPilbeam2 · ‎02-23-2021

I'm using the How to identify duplicate or unique values in Pro code provided by ERSI verbatim, and I'm getting an error. I have a feature class table that I'm looking for duplicates in. The technical document doesn't say whether to use the field name or alias, so I tried both. When I run it with the field name the code throws an exception. I stopped the script and opened Pro. The field gets created in the table but it's all NULLs.

This is the line where the exception is being thrown:

i=row.getValue(field_in)

Exception/Error:

  Message=ERROR 999999: Something unexpected caused the tool to fail. Contact Esri Technical Support (http://esriurl.com/support) to Report a Bug, and refer to the error help for potential solutions or workarounds.
  Source=\\GISFILE\GISSTAFF\Jared\Python Scripts\ArcGISPro\DuplicateFields.py
  StackTrace:
  File "\\GISFILE\GISSTAFF\Jared\Python Scripts\ArcGISPro\DuplicateFields.py", line 17, in <module>
    i=row.getValue(field_in)

When I run it with the alias it doesn't finish, or at least I've never let it because it's taking a remarkably long time. However, if I stop the script and open Pro to view the table, a lot of the 190K+ fields are written to. But, like I said it never finishes...

Code:

import arcpy

'''
This script will count the number of occurences of a value in a field ("field_in") and write them to a 
new field ("field_out")
'''

arcpy.env.workspace = r"C:\Users\jpilbeam\Downloads\DuplicateTesting.gdb\DuplicateTesting.gdb" #path to GDB goes here
infeature = "backup_02232021" #name of feature class goes here
field_in = "name" #this is the alias of the field. Name: is the actual name
field_out = "COUNT_"+field_in
arcpy.AddField_management(infeature, field_out,"SHORT")

lista= []
cursor1=arcpy.SearchCursor(infeature)
for row in cursor1:
    i=row.getValue(field_in) #<-- where exception is thrown using field name in field_in variable
    lista.append(i)
del cursor1, row

cursor2=arcpy.UpdateCursor(infeature)
for row in cursor2:
    i=row.getValue(field_in)
    occ=lista.count(i)
    row.setValue(field_out, occ)
    cursor2.updateRow(row)
del cursor2, row
print("----done----")

EDIT: Using the alias, I let the code run for like 15-20 mins and it finished with no errors! So, I guess it just needs that long with all these records?

DavidPike · ‎02-23-2021

Well if it works... Still confusing that it's taking an alias rather than the fieldname but the old cursors are pretty alien to me.

I'd recommend looking at Dan's suggestion as it may be faster.

If not and you're still looking for speed, I'm not sure the script itself is designed for performance.

First it has an old cursor SearchCursor—ArcGIS Pro | Documentation rather than the faster arcpy.da cursor has superceded it. Also in the cursor initiation, you can probably get a boost by limiting the fields argument to just the field(s) you need - at the moment, the entire row is returned.

Perhaps the .count method on the list is optimal, but I do wonder if there's a better way than appending everything to a list then counting it after (although this is exactly how I'd write my own junk code just to do something then gather dust for eternity!).

View solution in original post

DanPatterson · ‎02-23-2021

Find Identical (Data Management)—ArcGIS Pro | Documentation is useful as well, or for comparison

... sort of retired...

DavidPike · ‎02-23-2021

Well if it works... Still confusing that it's taking an alias rather than the fieldname but the old cursors are pretty alien to me.

I'd recommend looking at Dan's suggestion as it may be faster.

If not and you're still looking for speed, I'm not sure the script itself is designed for performance.

First it has an old cursor SearchCursor—ArcGIS Pro | Documentation rather than the faster arcpy.da cursor has superceded it. Also in the cursor initiation, you can probably get a boost by limiting the fields argument to just the field(s) you need - at the moment, the entire row is returned.

Perhaps the .count method on the list is optimal, but I do wonder if there's a better way than appending everything to a list then counting it after (although this is exactly how I'd write my own junk code just to do something then gather dust for eternity!).

DanPatterson · ‎02-23-2021

For unique, duplicates etc, numpy is the way to go.

import arcpy
from arcpy.da import TableToNumPyArray, NumPyArrayToTable
f = r"C:\arcpro_npg\npg\Project_npg\npgeom.gdb\sample_10k"  # --- a 10k -ish table
arr = TableToNumPyArray(f, "*")  # --- produce the array
flds = arr.dtype.names  # ---- to get all the fields
sub = arr[['County', 'Town_class', 'Facility']]  # --- subsample of the  fields
uni, idx, cnts = np.unique(sub, True, return_counts=True)  # --- perform unique

uni # ---- unique records for those 3 fields
array([('A', 'A_', 'Hall'), ('A', 'A_', 'Hosp'), ('A', 'B_', 'Hall'),
       ('A', 'B_', 'Hosp'), ('A', 'C_', 'Hall'), ('A', 'C_', 'Hosp'),
       ('B', 'A_', 'Hall'), ('B', 'A_', 'Hosp'), ('B', 'B_', 'Hall'),
       ('B', 'B_', 'Hosp'), ('B', 'C_', 'Hall'), ('B', 'C_', 'Hosp'),
       ('C', 'A_', 'Hall'), ('C', 'A_', 'Hosp'), ('C', 'B_', 'Hall'),
       ('C', 'B_', 'Hosp'), ('C', 'C_', 'Hall'), ('C', 'C_', 'Hosp'),
       ('D', 'A_', 'Hall'), ('D', 'A_', 'Hosp'), ('D', 'B_', 'Hall'),
       ('D', 'B_', 'Hosp'), ('D', 'C_', 'Hall'), ('D', 'C_', 'Hosp')],
      dtype={'names':['County','Town_class','Facility'], 'formats': 
                     ['<U4','<U12','<U16'], 'offsets':[8,24,72],
                      'itemsize':2184})

cnts # ---- counts for each unique record
array([590, 605, 465, 448, 442, 481, 554, 583, 446, 426, 491, 469, 607,
       621, 433, 468, 448, 457, 183, 198, 121, 162, 158, 140], dtype=int64)

sum(cnts)
9996

# ----
# NumPyArrayToTable  =====>  and back to Arc-world

... sort of retired...

JaredPilbeam2 · ‎02-24-2021

Thanks for the replies! I didn't even notice I was using the old cursor (was in a rush). So, now that I'm revamping this code, I'm suck with an error :

infeature = r"C:pathtofile\file.gdb\file" #name of feature class goes here
fields = ["Name:", "Location", "Email Address", "Cell Phone Number:", 
          "Home Phone Number:"] 

with arcpy.da.SearchCursor(infeature, fields) as cursor1:
    for row in cursor1:
        print(row)

Traceback (most recent call last):
  File "\pathto\DuplicateFields_updated.py", line 18, in <module>
    for row in cursor:
RuntimeError: An invalid SQL statement was used. [SELECT OBJECTID,Name:,Location,Email Address,Cell Phone Number:,Home Phone Number: FROM backup_02232021_NEW]

I'm using the fieldnames, not the aliases. Because they're strings, I don't believe the spaces matter?

DavidPike · ‎02-24-2021

I wouldn't think they're the correct fieldnames, as many of them contain spaces and colons, which I believe makes them invalid. I would double-check the actual fieldnames in the first instance, as the error seems to allude to that.

JaredPilbeam2 · ‎02-24-2021

The weird thing is, these are unmistakably the fieldnames. This feature class was created by Survey123 Connect. And, I can't change them as they're part of a live survey.

DavidPike · ‎02-24-2021

Can you send a screenshot?

JaredPilbeam2 · ‎02-24-2021

This is from the table. I copied the field name directly from the table and pasted into my list.

JaredPilbeam2 · ‎02-24-2021

Apparently, I'm braindead? I guess they were the aliases... whoops.