Order of da.UpdateCursor

SilvanStöckli · ‎04-07-2015

Hello

I'd like to manipulate the order the update cursor loops through my attribute table.

At the moment, my code looks as follows:

with arcpy.da.UpdateCursor(lyr, column) as cursor:

for row in cursor:

row[0] = i

i = i+1

cursor.updateRow(row)

where "lyr" is my layer and "column" is my column where the cursor needs to update the data. at the moment loops through and adds 1 as it needs but it does it in the order of the Object ID. I have an other column named "Nummer" where I have Numbers (Doulbe). The coursor should update the rows following those numbers (from 1 to x). How can I do it?

I tried "with arcpy.da.UpdateCursor(lyrkartenausschn, seitenzkartenausschn,"","","", sql_clause=(None, 'ORDER_BY Nummer A') as cursor: "

And also without all of those "", "", and also with "ORDER BY" and with/without the "A". But it always gives me an error.

Can anyone help me fixing this part of the code?

thanks! 😄

JoshuaBixby · ‎04-08-2015

What version of ArcGIS are you using? I am running ArcGIS 10.3, and I don't have any issues with using ORDER BY in a SQL clause for update cursors. That said, I do seem to recall problems with it back in ArcGIS 10.1 and possibly 10.2.0. Anyhow, I am just interested in whether your error might be related to the version you are running.

RichardFairhurst · ‎04-08-2015

Please be clear. Original update cursors and data access (da) module update cursors are two different things and may operate differently when it comes to the ORDER BY option. da update cursors are fast and the only cursor type worth using. So are you using the ORDER BY with a da module update cursor and if so what is the syntax? The documentation does not cover this aspect of using the da cursor at all and it would be nice to know how to use it if it is an option.

Having said that, dictionaries can out perform the ORDER BY operation by a factor of at least 10 times the speed, since no disk access is involved, whereas Updating using the Order By operator has to retrieve and write to each record in order, which is very slow in my experience and severely impacted by the overall size of the table, whereas unsorted reading to a dictionary and sorting the dictionary is extremely fast.

JoshuaBixby · ‎04-09-2015

Richard Fairhurst wrote:
Having said that, dictionaries can out perform the ORDER BY operation by a factor of at least 10 times the speed, since no disk access is involved, whereas Updating using the Order By operator has to retrieve and write to each record in order, which is very slow in my experience and severely impacted by the overall size of the table, whereas unsorted reading to a dictionary and sorting the dictionary is extremely fast.

10 times? What backend storage are you using?

As a quick test, I created a table with one million records and two fields. One field held a randomly generated number and the other just an empty field to be updated in order of the other field. When using an in_memory table, the ORDER BY approach was 300% faster. When using a file geodatabases, the ORDER BY approach was 7% slower. When using a personal geodatabases, the ORDER BY approach was the same. Disregarding the in_memory example, file-based geodatabases don't take that much of a hit using ORDER BY with an update cursor (arcpy.da).

The sorting of a dictionary might not involve any disk access, but the overall approach of the dictionary method surely does involve disk access. Iterating over a search cursor to populate the dictionary hits the disk, and iterating over an update cursor to update records hits the disk again. Although there is some overhead in having the database engine order the records, there is also overhead in iterating over the records twice instead of once.

I don't have access to Oracle or SQL Server this morning, so maybe ORDER BY does perform significantly worse in this case. At least for file-based geodatabases, I don't see a significant difference in performance.

RichardFairhurst · ‎04-09-2015

I am basing my statements on ArcObjects cursors, not Python cursors, since I don't know how to make a da cursor sort. da cursors seem more optimized than ArcObjects cursors, so that may be the reason you see such fast performance. Several years ago I gave up on sorted cursors and never was willing to try them again, especially since there is virtually no documentation on how to set it up (figuring out ArcObjects was a pain and disappointment).

ORDER BY is not documented with da cursors. Document how to use it in this post. I am at a disadvantage, since I have explained my code so you can test my approach, but you have not shown or explained your da cursor syntax as far as I have seen so I cannot test your approach (if I missed it, please repeat it).

My technique is actually more of a way to avoid doing joins and field calculations than it is to avoid sorting within a single feature class, but it fits this need as well. In the case of joins and data transfers between feature classes/tables my approach is much faster than all other methods I have tried.

JoshuaBixby · ‎04-10-2015

Fair enough, now I understand the perspective you are coming from.

If you haven't seen it already, you might be interested in the How is the data access cursor performance so enhanced compared to previous versions? thread over at StackExchange. The first answer is from Jason Scheirer, one of the arcpy.da developers. In his response he explains how they optimized the performance of the new cursors. There is also a link to some benchmark results.

Although the sql_clause documentation has remained the same, the arcpy.da SearchCursor code samples have been expanded with ArcGIS 10.3 to include an SQL ORDER BY example. The SearchCursor code sample is transferrable to the UpdateCursor.

The sql_clause was buggy when the data access module was first released at ArcGIS 10.1. I believe it was ArcGIS 10.2.1 where a couple bugs related to the sql_clause were patched.

Below is a code snippet that creates a table with randomly generated values and then sorts against those values to update another field in order.

import arcpy
import numpy
import random

fc = #table location in file geodatabase

random.seed(3.14)
l = []
for i in range(1000000):
    l.append((random.randint(0,100000000), 0.0))
narr = numpy.array(l, dtype=([('rand','i4'),('numb', 'f8')]))
arcpy.da.NumPyArrayToTable(narr, fc)

i = 1000
with arcpy.da.UpdateCursor(fc, ["rand", "numb"],sql_clause=(None,"ORDER BY rand")) as cur:
    for row in cur:
        i += 1 
        row[1] = i
        cur.updateRow(row)

RichardFairhurst · ‎04-10-2015

I appreciate getting information that brings me up to date on what is happening with da cursors in the latest versions of ArcGIS. I don't have 10.3, and only recently got 10.2.1 installed. I tried the Order By option at 10.1 with the release of the da cursor, but I gave up on it. I have not tried it since then.

This has happened to me several times where I try something that should work in an early release and discover that it has serious issues that make it either fail or perform horribly. Once I give up on code that fails me it can be hard to give Esri another chance. Generally figuring out a workaround took so much effort that I resent the original method that should have worked for wasting so much of my time. If a workaround is very effective, then even if the workaround is complex, I forget to even consider the possibility that the original approach that is simpler might have been fixed in the latest release. Anyway, thanks for bringing me up to date.

JoshuaBixby · ‎04-10-2015

I understand, completely. The fact that some of the sql_clause bugs took until 10.2.1 to get addressed is, well, frustrating. Oddly enough, I just tried my code snippet on an ArcGIS 10.1 SP1 machine and it worked. I thought ORDER BY was also impacted by NIM094797, but that doesn't seem to be the case.

Overall, I have found the ArcPy documentation in the new 10.3 help to be either improved or expanded for quite a few objects, methods, functions, etc....

SilvanStöckli · ‎04-08-2015

Hello Joshua

I'm using 10.3 aswell. So I think this isn't the problem.