using multiple cursors

7326
18
04-15-2016 10:02 AM
ChrisLowrie
New Contributor

I need to be able to search through a set of data points, each with a time stamp, and insert rows between two data points that are more than 5 seconds apart.  The first pass with the cursor is used to turn the time stamp into an integer so that I can work with it, but after that it won't reset and attempts to open another cursor, even after deleting the first, don't work.  Any ideas of how to do this?

0 Kudos
18 Replies
RichardFairhurst
MVP Honored Contributor

In ArcGIS databases designed by Esri, a concept of order and gaps is in a database has been created by the automatic creation and maintenance of the FID/ObjectID, which uniquely identifies each record at the time it is created.  New numbers are always appended to the end of the number order and any deleted record creates a gap in the FID/ObjectID numbers that cannot be filled, without transferring the data into a new table/FC.  Search cursors do in fact read the records in the FID/ObjectID order by default, which is the fastest order based on how the records are indexed and stored.  Cursors can be sorted, but all other record orders can only be read at the cost of a substantial performance hit when read directly from disk.  Most record sorting of tables initially is read in ObjectID order and is only resorted in memory through things like a tableview before being presented to the user to overcome this performance hit.  No other automatically maintained record numbering is associated with the Esri databases.

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Richard Fairhurst wrote:

Search cursors do in fact read the records in the FID/ObjectID order by default, ....

I don't recall seeing any Esri documentation that explicitly states ArcGIS or ArcPy cursors include a default ORDER BY clause on the FID/OID field for any kind of data store.  If you have found some, I am interested in the link.

0 Kudos
RichardFairhurst
MVP Honored Contributor

The only documentation about the FID/ObjectID I have seen is here.  It is not a technical discussion of the implementation of the ObjectID, nor is it an exhaustive discussion of the implications of the ObjectID for a programmer, and it in no way contradicts the behavior I am reporting that I have repeatedly observed for a Search Cursor.  Few things are fully documented for programmers about the internal mechanisms Esri has implemented, and most everything that goes beyond the exposed interfaces that may have critical implications for the programmer has to be deduced through experimentation and testing and only gets documented through the forum.

I do not have a document reference for this behavior.  However, having used Search Cursors for years on a daily basis it has been empirically established to a degree that I am certain that the FID/ObjectID is the established default order of the Search Cursor.  Clues that Esri databases with FID/ObjectIDs actively maintain some kind of internal order involving the FID/ObjectID for performance include the fact that FID/ObjectID fields are always automatically indexed by the system.  Search Cursor record counting is accurate and predictable and always ordered by the FID/ObjectID when used without any Order By parameter by the user.  FID/ObjectID lists returned by methods that access record Selection Sets always list the FID/ObjectID values in order.

I will acknowledge that Update Cursors have sometimes shown erratic behavior as far as reading records, especially if combined with a parallel Insert Cursor operation.  Update Cursors have been known to read the same record two or more times during a single pass of the cursor, usually reading the same record again immediately after an updateRow operation during consecutive for loop passes, but this is a bug as far as I am concerned and is a behavior that has never been documented by Esri or explained.  That Update Cursor behavior is a source of complaint on the forum since it screws up all record counting procedures with an Update Cursor and no one has been able to make sense of it.

0 Kudos
curtvprice
MVP Esteemed Contributor

Back early in ArcGIS time (the summer of 2000) I was warned directly to never never assume record order in ArcGIS like we used to in INFO (a brutally simple [and very fast) database that composed the database part of "Arc/INFO").  I was told different data formats (notably RDBMS data sources) may return data in an unpredictable order due to the RDBMS doing parallel processing that Esri does not control.  ORDER BY is supported for all geodatabase data sources. I also have made use of sorting results coming out of a search cursor as in : sorted(arcpy.da.SearchCursor(tbl, field)).

RichardFairhurst
MVP Honored Contributor

I only edit File geodatabase with cursors for performance reasons and download SDE data to that format for all geoprocessing.  The predictable order that the file geodatabase format provides only leads me to say that I have one more reason to continue doing what I do.  Under any circumstances, ObjectIDs do represent the order that records are created in a geodatabase and cannot be reused once issued or deleted without risking database corruption, and no other order is represented or maintained within the data automatically in ArcGIS without a custom written program.  When working with personal geodatbases I have seen the effect when a user corrupted the order and uniqueness of the ObjectID field through an Access update query run outside of ArcGIS, but nothing works correctly in ArcGIS if that happens.

0 Kudos
DanPatterson_Retired
MVP Emeritus

This whole thread has me seeing issues with 'bits' of the process getting parsed out into separate blocks of discontiguous memory or storage location on disk.  This will surely slow down the whole process.  I must agree with finishing one task, consolidating access in memory or on disk, then proceeding to the next task as being the best approach rather than to shuffle cursors within cursors.  Numpy has an 'iscontiguous' property when working with data arrays, which I use prior to running a second process... if it isn't, I consolidate before continuing.  Do File geodatabases or cursors in general, not employ such a protocol?

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

Richard, thanks for providing a link.  Although the text hasn't appeared to change from ArcGIS 10.1 to ArcGIS 10.4, there is a corresponding web page in the new online documentation:  Fundamentals of ObjectID fields.  Unfortunately, but not completely surprising, that page on ObjectIDs doesn't mention anything about sorted-ness or ordered-ness.

The ArcObjects Help for .NET Developers has a Geodatabase library page that I think sheds some light on the this topic:

The Geodatabase library provides the application programming interface (API) for the geodatabase. The geodatabase is a repository of geographic data built on standard industry relational and object relational database technology. The objects in the library provide a unified programming model for all supported data sources in ArcGIS. The Geodatabase library defines many of the interfaces that are implemented by data source providers higher in the architecture.

....

A table has one or more columns, referred to as fields, and contains an unordered collection of rows. For each field, each row has exactly one value in the data type of the field.

For databases, at least relational databases, tables represent unordered collections of rows.  That said, tables can be ordered to some extent by adding certain types of indexes, like a clustered index in SQL Server, but that is an implementation matter addressing performance and not a design principle of ordered-ness.

In the above quote, the Geodatabase library page is talking about the API and not the file geodatabase storage itself.  It seems reasonable enough that Esri could have implemented a cluster-like index on ObjectID, it would explain a fair amount of behavior you see, but I have never seen or found documentation that speaks to the issue directly.  Even if that is the case, it doesn't change the fact that a user accesses the contents of a file geodatabase through the API in one form or another, and that API doesn't guarantee any order on the rows of a table, unless using ORDER BY.

What you have seen empirically with ObjectID ordering and Search Cursors doesn't surprise me, because I do think there is a cluster-like index on ObjectID in file geodatabases, but what you have seen with Update Cursors also doesn't surprise me because order isn't guaranteed in the Geodatabase API unless ORDER BY clauses are present.

0 Kudos
RichardFairhurst
MVP Honored Contributor

I do not base many processes on the order of ObjectIDs, but everything I see says that Order By on any other column is slower, which probably has more to do with the fact the ObjectID is always the first column in a geodatabase than any internal storage of the records relative to each other.  It is guaranteed that a sort on the 40th column in a database will be much slower when read from disk using Order By than an Order By on the ObjectID in the first column.  Most sorted processing I do is through in memory processing of lists or dictionaries, or through Summary Statistics outputs that position the sorted fields at the beginning of the schema.  Order By also should never be done on a column that is not indexed.

In any event, order can play an important part of any process (drawing order, analysis, etc), and if I reorganize a database using the Sort tool or create a Summary Statistics table to make the ObjectID column reflect that order it absolutely does benefit performance of tasks that use that order.  Again this may be more because the ObjectID is always the first column in the schema, it is always indexed for quick access, and it has a known field type all of which optimize it for the quickest access and processing.  Constantly processing an SQL Order By on a random field column in the database that the user may have failed to index that has to repeatedly determine its field position and value type each time you want to run a sort algorithm is bound to be slower.  Column position in a database is not random and the first column is the fastest column to access.

The documentation makes this a black box, so all we can argue is whether or not empirically it is useless to get an ordered ObjectID or not.  I say it is useful where quick access to ordered records is essential based on everything I have experienced, whether or not I can explain why that is occurring to your satisfaction.  I care far more about giving advice on the practical effect of doing or not something than I do about the actual internal mechanism used to achieve that effect in a Black Box that I cannot access.  So if my guesses on the Black Box are wrong, but the practical outcome is real nonetheless, my advice on the forum will always be of more value for the second than the first, and I am satisfied with that.  I also do not advise anyone to constantly recreate their feature class every time they want to insert a single record into the middle of the ObjectID order, since that is very inefficient, but periodically optimizing for order can be useful if you don't rely on the ObjectID representing the same record each time you access the database.  I actually don't sort most of my data, because in most cases the value of having an ObjectID in my master database consistently return the same record every time I access it is more valuable to me than having it represent a constantly changing order. Also the design of the ObjectID is primarily intended to support consistent access to the same record in a feature class, although that is only reliable if a feature class that has never been moved or recreated.

NeilAyres
MVP Alum

I didn't expect my few lines of comment to generate such a rich & long discussion on the underlying nature of the data tables.

Curtis, I remember Info well. Did a lot with that in the depths of info processing. Yes, Info ran on ordered lists. So was not a rdms in any modern sense. Do you recall odd / even record processing.....?

0 Kudos