Dan_Patterson

Cursors... a cursory overview

Blog Post created by Dan_Patterson Champion on Nov 4, 2017

The interlude... An example of searchcursors and their array rootsCurses at cursors... pretty well a regular occurrence on this site.  People love them and love to hate them.

They try to nest them, and nest them within 'for' loops and 'with' statements with calls that sound poetic ( ... for row in rows, with while for arc thou ... )

 

There are old cursors and new cursors (the da-less and the da cursors).  Cursors appear in other guises such as the new, and cleverly named, 'arcgis' module (digression # 1 ... really? something else with arcgis in it! Who is in charge of branding).

 

Perhaps cursors are cloaked, in other arcpy and  data access module methods (ie. blank-to-NumPyArray and NumPyArray-to-blank).  Who knows for sure since much is locked in arcgisscripting.pyd.

 

Sadly, we deal in a work of mixed data types.  Our tables contain columns of attributes, organized sequentially by rows.  Sometimes the row order has meaning, sometimes not.   Each column contains one data type in a well-formed data structure.  This is why spreadsheets are purely evil for trying to create and maintain data structure, order and form (you can put anything anywhere).

 

The interlude... An example of searchcursors and their array roots
in_tbl = r"C:\Git_Dan\arraytools\Data\numpy_demos.gdb\sample_10k"

sc = arcpy.da.SearchCursor(in_tbl, "*")             # the plain searchcursor
a = arcpy.da.SearchCursor(in_tbl, "*")._as_narray() # using one of its magic methods
b = arcpy.da.TableToNumPyArray(in_tbl, "*")         # using the cloak

np.all(a == b)  # True           ..... everything is equal .......

sc._dtype           # .... here are the fields and dtype in the table ....
  dtype([('OBJECTID', '<i4'), ('f0', '<i4'), ('County', '<U4'),
         ('Town', '<U12'), ('Facility', '<U16'), ('Time', '<i4'),
         ('Test', '<U24')])

sc.fields       # .... just the field names, no need for arcpy.ListFields(...) .....
  ('OBJECTID', 'f0', 'County', 'Town', 'Facility', 'Time', 'Test')


# ---- Some timing... 10,000 records ----
%timeit arcpy.da.SearchCursor(in_tbl, "*")
153 ms ± 12.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit arcpy.da.SearchCursor(in_tbl, "*")._as_narray()
40.4 ms ± 970 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit arcpy.da.TableToNumPyArray(in_tbl, "*")
52.1 ms ± 9.32 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

 

If someone can explain why the plain searchcursor is slower than its dressed up (or down?) counterparts, I would love to hear about it .

 

 

Back to the main event

Harkening back to the fields of mathematics, arrays are assemblages of data, in 1, 2, 3 or more dimensions.  If an array of any dimension has a uniform data type, then life is easier from a structural and usage perspective (this is one reason why Remote Sensing is easier than GIS ... bring on the mail).  We need to maintain an index which ties our geometry to our attributes so what goes where, and where is what, doesn't get mixed up (digression # 2... I am sure this isn't what the branders meant by The Science of Where but we can only hope)

 

Nerd Stuff 

Enough with the boring stuff... bring on the code.

Some of this has been looked from a slightly different perspective in 

 

Get to the Points... arcpy, numpy, pandas

We need some data to work with so... how about a square.

    in_fc = r"C:\Your_spaceless_path\Your.gdb\square"

The 'Describe' object

 

The 'describe' object does just that: describes an object, in this case a FeatureClass.
desc = arcpy.da.Describe(in_fc)
In the newer arcpy.da module, the values can be accessed from a dictionary.  A quick way to get the sorted dictionary keys is to use a list comprehension.  If you want the values, then you can obtain them in a similar fashion.
sk = sorted([k for k in desc.keys()])  # sorted keys
kv = [(k, desc[k]) for k in sk]  # key/value pairs
kv = "\n".join(["{!s:<20} {}".format(k, desc[k]) for k in sk])
Some useful keys associated with featureclasses are extracted as follows:
With appropriate snips in the full list

[..., 'MExtent', 'OIDFieldName', 'ZExtent', 'aliasName', 'areaFieldName',
  'baseName', ... 'catalogPath', ... 'dataElementType', 'dataType',
  'datasetType', ... 'extent', 'featureType', 'fields', 'file', ... 'hasM',
  'hasOID', 'hasSpatialIndex', 'hasZ', 'indexes', ... 'lengthFieldName',
  ... 'name', 'path', ... 'shapeFieldName', 'shapeType', 'spatialReference',
  ...]

The Cursor object
A cursor gives access to the geometry and attributes of a featureclass.  It is always recommended to use the spatial reference when creating the cursor.  Historically (fixed?), if it was omitted, geometry discrepancies would arise. This information can easily be obtained from the describe object from the previous section.
SR = desc['spatialReference']  # Get the search cursor object.
flds = "*"
args = [in_fc, flds, None, SR, True, (None, None)]
cur = arcpy.da.SearchCursor(*args)
See what it reveals...
dir(cur)

['__class__', '__delattr__', '__dir__', '__doc__', '__enter__', '__eq__',
'__esri_toolinfo__', '__exit__', '__format__', '__ge__', '__getattribute__',
'__getitem__', '__gt__', '__hash__', '__init__', '__iter__', '__le__',
'__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__',
'__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'_as_narray', '_dtype', 'fields', 'next', 'reset']
 
Individual properties are:
cur.__class__
<class 'da.SearchCursor'>

cur.__class__.__mro__
(<class 'da.SearchCursor'>, <class 'object'>)
The search cursor inherits from 'object' and the difference in their properties and methods offered by the da.SearchCursor can be determined as follows:
s0 = set(dir(mro[0]))
s1 = set(dir(mro[1]))

sorted(list(set.difference(s0, s1)))

 ['__enter__', '__esri_toolinfo__', '__exit__', '__getitem__', '__iter__',
  '__next__', '_as_narray', '_dtype', 'fields', 'next', 'reset']
 
The cursor offers the means to process its objects.
cur.__esri_toolinfo__
 ['FeatureLayer|Table|TableView|Dataset|FeatureDataset::::', 'String::*::',
  'Python::None::', 'CoordinateSystem::::']
 
This handy one returns a numpy structured/recarray
type(cur._as_narray())
 <class 'numpy.ndarray'>
How about information on the attributes of in_fc (more about this later)
cur._as_narray().__array_interface__

 {'version': 3,
  'strides': None,
  'shape': (0,),
  'typestr': '|V36',
  'descr': [('OBJECTID', '<i4'),
            ('Shape', '<f8', (2,)),
            ('Shape_Length', '<f8'),
            ('Shape_Area', '<f8')],
  'data': (2044504703824, False)}
 
We got a glimpse of the field names and data types from the __array_interface__, but this information can be accessed directly as well.
cur.fields
 ('OBJECTID', 'Shape', 'Shape_Length', 'Shape_Area')

cur._dtype
dtype([('OBJECTID', '<i4'),
       ('Shape', '<f8', (2,)),
       ('Shape_Length', '<f8'),
       ('Shape_Area', '<f8')])    
Now, the gotcha's.  We created our search cursor at the beginning and each record was cycled through until it reached the end.  If we attempt to get its properties we may in for a surprise, so we need to 'reset' the cursor back to the start.
cur._as_narray()  # try to get its properties, all we get is the dtype
array([],
    dtype=[('OBJECTID', '<i4'), ('Shape', '<f8', (2,)),
     ('Shape_Length', '<f8'), ('Shape_Area', '<f8')])
Once the cursor is reset, the array values for the square are revealed with the appropriate data type.
cur.reset()
cur._as_narray()  # reset to the beginning  

array([(1, [342000.0, 5022000.0], 4000.0, 1000000.0),
       (1, [342000.0, 5023000.0], 4000.0, 1000000.0),
       (1, [343000.0, 5023000.0], 4000.0, 1000000.0),
       (1, [343000.0, 5022000.0], 4000.0, 1000000.0),
       (1, [342000.0, 5022000.0], 4000.0, 1000000.0)],
    dtype=[('OBJECTID', '<i4'), ('Shape', '<f8', (2,)),
           ('Shape_Length', '<f8'), ('Shape_Area', '<f8')])
    
There is no automatic reset, so be careful.  You can print the objects in the array in a couple of ways.
cur.reset()
for row in cur:
   print(("{} "*len(row)).format(*row))  # print individual elements

  1 (342000.0, 5022000.0) 4000.0 1000000.0
 1 (342000.0, 5023000.0) 4000.0 1000000.0
 1 (343000.0, 5023000.0) 4000.0 1000000.0
 1 (343000.0, 5022000.0) 4000.0 1000000.0
 1 (342000.0, 5022000.0) 4000.0 1000000.0
Resetting the cursor, and print again.
cur.reset()
for row in cur:
    print(row)  # print the whole row as a tuple

   (1, (342000.0, 5022000.0), 4000.0, 1000000.0)
   (1, (342000.0, 5023000.0), 4000.0, 1000000.0)
   (1, (343000.0, 5023000.0), 4000.0, 1000000.0)
   (1, (343000.0, 5022000.0), 4000.0, 1000000.0)
   (1, (342000.0, 5022000.0), 4000.0, 1000000.0)
Of course since generator-like objects can be converted to a list, that can be done as an alternative, particularly if you have the memory and wish to deal with list objects instead.
cur.reset()
list(cur)
 [(1, (342000.0, 5022000.0), 4000.0, 1000000.0),
  (1, (342000.0, 5023000.0), 4000.0, 1000000.0),
  (1, (343000.0, 5023000.0), 4000.0, 1000000.0),
  (1, (343000.0, 5022000.0), 4000.0, 1000000.0),
  (1, (342000.0, 5022000.0), 4000.0, 1000000.0)]
 
So if you know the data type of the components of the cursor, you can go to the ndarray in an indirect fashion.
cur.reset()
dt = cur._dtype
c_lst = list(cur)

np.asarray(c_lst, dtype=dt)

array([(1, [342000.0, 5022000.0], 4000.0, 1000000.0),
       (1, [342000.0, 5023000.0], 4000.0, 1000000.0),
       (1, [343000.0, 5023000.0], 4000.0, 1000000.0),
       (1, [343000.0, 5022000.0], 4000.0, 1000000.0),
       (1, [342000.0, 5022000.0], 4000.0, 1000000.0)],
    dtype=[('OBJECTID', '<i4'), ('Shape', '<f8', (2,)),
           ('Shape_Length', '<f8'), ('Shape_Area', '<f8')])
The ndarray can be viewed as a record array.  Since the data type and structure remain the same, a 'view' of the array as a record array (recarray).  Record arrays allow the user to slice the array using conventional array slicing or by object dot notation.  
a = a.view(np.recarray)

a.Shape == a['Shape']  # check to see if slicing equals dot notation

array([[ True,  True],
      [ True,  True],
       [ True,  True],
       [ True,  True],
       [ True,  True]], dtype=bool)
Or more simply...
np.all(a.Shape == a['Shape'])
True       

a.Shape  # or a['Shape']

array([[  342000.,  5022000.],
       [  342000.,  5023000.],
       [  343000.,  5023000.],
       [  343000.,  5022000.],
       [  342000.,  5022000.]])
 
You can calculate the properties of the objects simply, but in the case of polygons, the duplicate start/end point should be reduced to a singleton.  In the examples, the object's shape is obtained, then the desired property is derived on a column basis.
pnts = a.Shape[:-1]       # get the unique points

cent = pnts.mean(axis=0)  # return the mean by column

cent array([  342500.,  5022500.])
 
With some fancy work, and calling one of my previously defined array functions in the 'arraytools' module, you can do things like determine interpoint distances.
import arraytools as art

art.e_dist(cent, pnts)

array([ 707.11,  707.11,  707.11,  707.11])
Which is correct given the square polygon's shape.
Another example, to demonstrate array functions.  In the case of polygons, it is useful to have the first and last point (ie duplicates) retained to ensure closure of the polygon. 
poly = a.Shape

art.e_leng(poly)  # method to return polygon perimeter/length, total, then by segment

 (4000.0, [array([[ 1000.,  1000.,  1000.,  1000.]])])

art.e_area(poly)

 1000000.0
 
------------------------------------------------------------------------------ 
Working with cursors
--------------------
Cursors can access the columns in the tabular data in a variety of ways.  One of the easiest to follow is to simply refer to the columns by the order in which they appear when they were retrieved.  This is fine if one writes scripts in the same way.  In the example that follows, the list of fields to be used with the cursor operations is defined as:
in_flds = ['OID@', 'SHAPE@X', 'SHAPE@Y', 'Int_fld', 'Float_fld', 'String_fld']

 

When using the above notation, the position of the fields is used to reference their values.   So you may see code that uses ' for row in cursor '  with row[0] being the feature object id (OID@) and row[3] being the value from an integer field (Int_fld).  If you are like me, anything beyond 2, means you are finger counting remembering the python counting is zero-based.  I now prefer to spend the extra time assigning variable names rather than using positional notation.  You can see this in lines 12-13 below.

in_fc = r'C:\Folder\path_to\A_Geodatabase.gdb\FeatureClass   # or Table

desc = arcpy.Describe(in_fc)
SR = desc.spatialReference
in_flds = ['OID@', 'SHAPE@X', 'SHAPE@Y', 'Int_fld', 'Float_fld', 'String_fld']
where_clause = None
spatial_reference = SR
explode_to_points = True
sql_clause = (None, None)

results = []
with arcpy.da.SearchCursor(in_tbl, in_flds) as cursor:
    for id, x, y, i_val, f_val, s_val in cursor:
        if id > 10:
            do stuff
        else:
            do other stuff
        results.append(... put the stuff here ...)
return results

-----------------------------------------------------------------------------
arcgisscripting
---------------
arcgisscripting can be located in your ArcGIS Pro distribution once everything is imported (arcgisscripting.__file__).  It is located in the installation path (substitute C:\\ArcPro for your Pro path, everything else is the same)
'C:\\ArcPro\\bin\\Python\\envs\\arcgispro-py3\\lib\\site-packages\\arcgisscripting.pyd'
Now importing arcpy, also imports parts of arcgisscripting and it also imports the geoprocessor from
C:\ArcPro\Resources\ArcPy\arcpy\geoprocessing\__init__
which imports _base.py which uses the Geoprocessor class as 'gp'

dir(arcgisscripting)
['ExecuteAbort', 'ExecuteError', 'ExecuteWarning', 'NumPyArrayToRaster', 'Raster', 'RasterToNumPyArray', '__cleanup__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_addTimeInterval', '_analyzeForSD', '_chart', '_convertWebMapToMapDocument', '_createGISServerConnectionFile', '_createGeocodeSDDraft', '_createMapSDDraft', '_createimageservicesddraft', '_getImageEXIFProperties', '_getUTMFromLocation', '_hasLocalFunctionRasterImplementation', '_listDateTimeStringFormats', '_listStyleItems', '_listTimeZones', '_mapping', '_ss', '_wrapLocalFunctionRaster', '_wrapToolRaster', 'arcgis', 'create', 'da', 'getmytoolboxespath', 'getsystemtoolboxespath', 'getsystemtoolboxespaths', 'na']

dir(arcgisscripting.da)
['Describe', 'Domain', 'Editor', 'ExtendTable', 'FeatureClassToNumPyArray', 'InsertCursor', 'ListDomains', 'ListFieldConflictFilters', 'ListReplicas', 'ListSubtypes', 'ListVersions', 'NumPyArrayToFeatureClass', 'NumPyArrayToTable', 'Replica', 'SearchCursor', 'TableToNumPyArray', 'UpdateCursor', 'Version', 'Walk', '__doc__', '__loader__', '__name__', '__package__', '__spec__', '_internal_eq', '_internal_sd', '_internal_vb'
References
----------

 

Other discussions

----------

https://community.esri.com/docs/DOC-10416-are-searchcursors-brutally-slow-they-need-not-be

 

----------

More later...

Outcomes