arcpy.da.UpdateCursor() quirk for large (~16K records) dataset

5356
15
09-30-2014 09:08 AM
PerryKesselbaum
New Contributor II

I'm running a Python script using the data access update cursor on a versioned feature class with 16,814 rows stored in a SQL Server SDE. I am using a for loop to go through each row and iterating a count variable each time through the loop. Without running the updateRow() method, things are as expected: my count is equal to the number of rows I have. However, when I run the updateRow() method, the count gets much higher and is inconsistent from one execution to the next. It's anywhere from 20K to 30K. I suspect this has something to do with the time it takes to manipulate so many records, similar to this question: arcpy - Scaling DA UpdateCursor to large datasets? - Geographic Information Systems Stack Exchange

When I run the same script on a smaller subset of the data (~1100 records), things are consistent and normal. This is also the case when running the script on the full dataset exported to a local file geodatabase (instead of in the SDE.)

Has anyone else run into this issue? Can anyone explain what is happening with more certainty? I'm not sure of the ideal workaround just yet (either use SQL in a while loop to break up the records returned in the cursor--as was suggested in the linked question, or running the script on local data and then copying it up to the SDE), so if anyone has any other ideas, I'm all ears as well.

Thanks!

0 Kudos
15 Replies
RichardFairhurst
MVP Honored Contributor

The UpdateRow action in an Edit Session (which you had to set up for a versioned feature class) fires all triggers for fields that respond to Modified edit events, like Feature Linked Annotation label expression fields.  Basically if a Composite Relationship Class exists with a listerner for any field that you included in the UpdateCursor field list.  Every time a triggers fires and causes an update elsewhere, it results in the same record being processed by the cursor twice, once before the trigger and once after.

The reason it is not consistent is that not all records have a relationship to an actual feature in another feature class.  So features with no Feature Linked Annotation will only be seen by the cursor once, but features connected to Feature Linked Annotation will be processed twice.  Additionally, when a record is processed twice, changes are taking place someplace you may not expect, i.e., Feature Linked Annotation triggers rebuild the annotation based on the label expression every time the UpdateRow fires on the feature they are linked to if a field in their label expression was visible to the UpdateCursor.  So you are destroying most of the manual adjustments you probably have made to your annotation.

I destroyed about 2/3s of all the manual edits ever made to my Annotation before I realized what was happening and had to restore it from a backup due to this behavior.  I have complained to Esri that this behavior is not documented and is not at all similar to how the Editor responds in an interactive Edit session.  They have not admitted it is a bug and will most likely post it in a blog no one will read rather than in the cursor documentation.  It is a huge gotcha for editors of Feature Linked Annotation.

You cannot include any field in your update cursor field list that the Editor is monitoring for Modified Edit events through a listener that triggers a Composite Relationship class behavior.  Including the field in the field list and using UpdateRow while within an Editor Session on a feature is all that is required to fire the trigger, nothing else.  The trigger fires even if the cursor never directly reads the field's value or never makes any modification to the field's value.

This does not happen if you can use the UpdateCursor outside of an edit session, but that is not allowed with Feature Linked Annotation Feature Classes, so in that case avoiding the Editor is not an option.  See this post as well.

I mentioned that the double processing of a feature by the cursor in this situation is another problem for routines that need to count edited features, but was told I would have to log a separate issue to have Esri look at it.  Since I now will only use the UpdateCursor on fields that the Editor never listens to so that I never again destroy my annotation, I won't be triggering the double count behavior again either.

0 Kudos
PerryKesselbaum
New Contributor II

Richard,

Thanks for your reply. The script is being run on a feature class in a Composite Relationship (Attachments enabled using GlobalID as the relate field) however, we are not including the GlobalID field in the field list for the UpdateCursor. We've also run it on a feature class that does not participate in any relationships with the same results.

I've enabled attachments on my 1100 record subset (in the SDE) of the larger feature class and the script still returns the proper count when run on it.

Do you think the issue with the relationship could still be cause of the inconsistent count?

Another workaround I am considering is using the legacy update cursor. This will allow me to run the script without starting an edit session. Do you know if the legacy cursor behaves in the same way when updating records in a feature class that participates in a relationship?

0 Kudos
RichardFairhurst
MVP Honored Contributor

Perry:

The GlobalID field does not have to be in the field list, only a field that the Composite Relatinship actually responds to.  For example, if I had a Relationship Class set up like yours, and as part of the relationship class behavior any changes in a field called Name results in changes to the related features of the relationship class, then if Name is in the field list, any feature with actual related features is being processed by the cursor at least twice, once before the trigger is fired and once after.  Additionally those related features are changing in what ever way the Name field makes them change, even though the Name value did not change.  So manual overrides get destroyed.

I know the cursor processes these features at least twice, but I did not check to see if it processes only two times no matter how many related features exist or one more time for each and every related feature.

I believe the legacy cursor will behave the same way, but I have not checked.  If you can use the legacy cursor outside of an Editor session, then the behavior will not be triggered, since it requires Editor listener events to be active.  Or in my example, if I excluded the field Name from the field list and no other field in the field list was connected to a relationship class listener, then the UpdateRow method would return a correct count even in an Editor session, since the listener never would be triggered.  I think ObjectID or GlobalID could be included in the field list, since the update cursor does not attempt to update these values.

0 Kudos
NickG
by
New Contributor II

"We've also run it on a feature class that does not participate in any relationships with the same results."

-Perry

Sounds like hes running into issues on non-related feature classes as well.

0 Kudos
RichardFairhurst
MVP Honored Contributor

I can only say that my tests have proven to my satisfaction that one cause of inaccurate counts is due to Editor listener events related to Composite Relationship Classes, but that may not be the only cause possible. If he has any Editor extension enabled of any kind, it is possible they could result in a similar behavior without any relationship class being present.  I bet the key is that he always has done the UpdateRow inside an Editor session when he has observed the miscount.

0 Kudos
PerryKesselbaum
New Contributor II

Richard,


Thanks for your comments. Correct me if I am wrong, but I was under the impression that the da cursor needs to run in an edit session. I have accomplished this as in the code snipped below:

    edit = arcpy.da.Editor(WORKSPACE)

    edit.startEditing(False, True)

    edit.startOperation()

    ... (Cursor runs here) ...

    edit.stopOperation()

    edit.stopEditing(True)

Aside from this, I have no other Editors, or extensions working on the feature class. As far as I'm aware, the relationship is static and has no listener events. It is only accessing the globalID field of the related feature class to match it to rows in the attachment table.

I will try re-writing the script to use the legacy cursors (which I know will work outside of an edit session) and report back if it still generates the same count issues. Thanks again for all your input.

0 Kudos
RichardFairhurst
MVP Honored Contributor

da cursors require an Editor session set up the way you showed in your code for versioned SDE data and complex feature classes, like Feature Linked Annotation, Composite Networks, topologies, etc.  Simple feature classes that are in a shapefile/non-versioned geodatabase do not require an Editor session when you use a da cursor.  I never used the old cursors much, since they are so slow and the Field Calculator was faster.  I thought they behaved the same.

0 Kudos
JamesCrandall
MVP Frequent Contributor

" I am using a for loop to go through each row and iterating a count variable each time through the loop."

That sounds like a taxing event.  Hard to tell with no code to review, but have you attempted to issue your SQL when constructing your UpdateCursor?

0 Kudos
PerryKesselbaum
New Contributor II

James,

Here's the code snippet of the UpdateCursor() loop:

edit = arcpy.da.Editor(WORKSPACE)

edit.startEditing(False, True)

edit.startOperation()

count = 0

with arcpy.da.UpdateCursor(parcels, fields) as updateCur:

        for row in updateCur:

            row[0] = newValue

            ...

            updateCur.updateRow(row)

            count += 1

edit.stopOperation()

edit.stopEditing(True)

I need to update the entire dataset, so I haven't tried to use an SQL where clause to limit the number of records returned.

0 Kudos