arcpy.da.UpdateCursor() quirk for large (~16K records) dataset

6106
15
09-30-2014 09:08 AM
PerryKesselbaum
Deactivated User

I'm running a Python script using the data access update cursor on a versioned feature class with 16,814 rows stored in a SQL Server SDE. I am using a for loop to go through each row and iterating a count variable each time through the loop. Without running the updateRow() method, things are as expected: my count is equal to the number of rows I have. However, when I run the updateRow() method, the count gets much higher and is inconsistent from one execution to the next. It's anywhere from 20K to 30K. I suspect this has something to do with the time it takes to manipulate so many records, similar to this question: arcpy - Scaling DA UpdateCursor to large datasets? - Geographic Information Systems Stack Exchange

When I run the same script on a smaller subset of the data (~1100 records), things are consistent and normal. This is also the case when running the script on the full dataset exported to a local file geodatabase (instead of in the SDE.)

Has anyone else run into this issue? Can anyone explain what is happening with more certainty? I'm not sure of the ideal workaround just yet (either use SQL in a while loop to break up the records returned in the cursor--as was suggested in the linked question, or running the script on local data and then copying it up to the SDE), so if anyone has any other ideas, I'm all ears as well.

Thanks!

0 Kudos
15 Replies
PerryKesselbaum
Deactivated User

This is now filed as a bug:

BUG-000082426: arcpy.da.UpdateCursor() returns duplicate rows when run against large, versioned SDE feature classes

From my understanding, it also appears to only affect datasets larger than 10K or so records, and only those using SQL Server. The best work-around appears to be to manipulate the data directly in the DB environment through SQL.

JoshuaBixby
MVP Esteemed Contributor

10k records isn't that many, really.  I assume the limit or problem is tied to the number of records being returned by the UpdateCursor and not the number of records in the underlying dataset, right?  Also, have you tried the older arcpy.UpdateCursor instead of the arcpy.da.UpdateCursor?  I know the former is slower, but if it works and doesn't generate duplicate rows.

0 Kudos
NickG
by
Emerging Contributor

The old cursor as well as .NET ArcObjects have the same issue. I did find that the number of fields returned with arcpy.da had some effect on the number of records returned.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Seems like a pretty deep or pervasive problem, which should provide some sense of urgency one would think, but it could also be the excuse for not addressing the issue at all.  I have seen the latter more than a handful of times.

Thanks for pursuing with Esri Support and getting a bug logged.

0 Kudos
IanBroad
Deactivated User

ESRI are you putting out a patch soon to solve this problem? Is this still present in 10.3?

0 Kudos
VisheshMaskey
Deactivated User

I have 35K records and update cursor just crossed 65K count and still running. I stopped it as it takes more than 3 hours to run. I will figure out a way to filter out the records but this is very annoying. I'm using 10.2.2 and this started happening a few days ago on an existing script which was working fine.

0 Kudos