Begin da.SearchCursor at a specific row

BrookeHodge · ‎12-14-2020

Hi,

I'm trying to start reading through a da.SearchCursor at a specific row (this row is randomly selected), but I can't figure out how to specify that in a search cursor. I want to do something similar to a slice in a list (https://stackoverflow.com/questions/509211/understanding-slice-notation) but this syntax doesn't seem to work in a searchcursor. Basically, I want to start running through my searchcursor starting at a random row. This table has to be sorted though, so I can't just randomize the the rows within the table, but I need to first sort the table based on attributes, then create a random number that is between 0 and the number of records in that table (I can figure that much out), but then I need to specify in the searchcursor to start at that random number. So for instance, say I have 20 records in my table, and I create a random number that is 3, I want to start my search cursor at row 3 in my table (and skip the first 2 rows). This is where I am stuck. Does anyone have any ideas how to do this?

In addition, after I figure that out, I'll want to re-loop around and start at the beginning of the search cursor (the actual first row), if anyone has any suggestions on that.

BlakeTerhune · ‎12-14-2020

Interesting problem. Here's my first thought...

import arcpy
import random

row_count = int(arcpy.GetCount_management(fc).getOutput(0))
random_rownum = random.randint(1, row_count)

with arcpy.da.SearchCursor(fc, fields) as cursor:
    for rownum, row in enumerate(cursor, start=1):
        if rownum >= random_rownum:
            # do something
    cursor.reset()
    for rownum, row in enumerate(cursor, start=1):
        if rownum < random_rownum:
            # do something
        else:
            break

This isn't ultra efficient so if performance matters and you have many (millions) of rows, this might be a little slower because it's looping over all the rows twice.

EDIT:

I added a break at the end after finishing the first portion of records so it reduces the extra iteration by half.

View solution in original post

BlakeTerhune · ‎12-14-2020

Interesting problem. Here's my first thought...

import arcpy
import random

row_count = int(arcpy.GetCount_management(fc).getOutput(0))
random_rownum = random.randint(1, row_count)

with arcpy.da.SearchCursor(fc, fields) as cursor:
    for rownum, row in enumerate(cursor, start=1):
        if rownum >= random_rownum:
            # do something
    cursor.reset()
    for rownum, row in enumerate(cursor, start=1):
        if rownum < random_rownum:
            # do something
        else:
            break

This isn't ultra efficient so if performance matters and you have many (millions) of rows, this might be a little slower because it's looping over all the rows twice.

EDIT:

I added a break at the end after finishing the first portion of records so it reduces the extra iteration by half.

BrookeHodge · ‎12-15-2020

Thanks for your reply and your suggestion! This seemed to work just as I wanted it to! After the reset, I just continued with a standard 'for row in cur:' since I just wanted it to start from the beginning, I didn't need to test if it was less than the random number, but this solution seemed to work just as I wanted. Thanks again!

Kara_Shindle · ‎12-14-2020

Have you got any code?

I'm a newbie, but my first thought is that you would generate and store your random cursor row number in a variable.

Now I thought that SearchCursor took a parameter that said which rows it needed to start with. The SC goes until the end, at which point, I believe you would have to re-create the cursor object in order to start at the beginning of the table and run to the row you specified or call the cursor's reset() method. I do believe this puts it back at the first row, but I'm guessing you would have to build in logic if your randomly generated row ended up being the first row?

Another thought where perhaps you need to figure out your row, then extract all the rows you want to iterate through into their own list, and then run the search cursor from there?

DanPatterson · ‎12-14-2020

What are you doing with the data?

What you describe sounds like a python, numpy job

TableToNumPyArray—ArcGIS Pro | Documentation

They are designed to work with tabular data and excel at slicing, dicing, sorting and statistical summary

d = tbl_data(fc3)   # ----- just some data
names = d.dtype.names  # ---- the column names
points = d['PNT_COUNT']   # ---- a column
srted = np.sort(d, axis=0, order='PNT_COUNT')  # ---- sort on that column

names
('OBJECTID', 'ids', 'CENTROID_X', 'CENTROID_Y', 'INSIDE_X',
 'INSIDE_Y', 'PART_COUNT', 'PNT_COUNT', 'Sort_')

d
array([(1, 11,  ...,   1.000,  11.000,    3),
       (2, 12,  ...,   1.000,   8.000,    2),
       (3, 13,  ...,   1.000,   6.000,    1),
       (4, 14,  ...,   1.000,   5.000,    0),
       (5, 15,  ...,   1.000,  10.000, -999),
       (6, 16,  ...,   1.000,  10.000, -999),
       (9, 19,  ...,   1.000,   5.000, -999)],
      dtype=[('OBJECTID', '<i4'), ('ids', '<i4'), ...,
             ('PART_COUNT', '<f8'), ('PNT_COUNT', '<f8'), ('Sort_', '<i4')])

points
array([ 11.000,   8.000,   6.000,   5.000,  10.000,  10.000,   5.000])

srted
array([(4, 14,  ...,   1.000,   5.000,    0),
       (9, 19,  ...,   1.000,   5.000, -999),
       (3, 13,  ...,   1.000,   6.000,    1),
       (2, 12,  ...,   1.000,   8.000,    2),
       (5, 15,  ...,   1.000,  10.000, -999),
       (6, 16,  ...,   1.000,  10.000, -999),
       (1, 11,  ...,   1.000,  11.000,    3)],
      dtype=[('OBJECTID', '<i4'), ('ids', '<i4'), ...,
             ('PART_COUNT', '<f8'), ('PNT_COUNT', '<f8'), ('Sort_', '<i4')])

d[:3]  # ---- a slice of the first 3 rows
array([(1, 11,  ...,   1.000,  11.000, 3),
       (2, 12,  ...,   1.000,   8.000, 2),
       (3, 13,  ...,   1.000,   6.000, 1)],
      dtype=[('OBJECTID', '<i4'), ('ids', '<i4'),...,
             ('PART_COUNT', '<f8'), ('PNT_COUNT', '<f8'), ('Sort_', '<i4')])

sub = d[['OBJECTID', 'PNT_COUNT', 'Sort_']]  # ---- a slice of 3 columns
sub
array([(1,  11.000,    3), (2,   8.000,    2), (3,   6.000,    1),
       (4,   5.000,    0), (5,  10.000, -999),
       (6,  10.000, -999), (9,   5.000, -999)],
      ....

... sort of retired...

JoshuaBixby · ‎12-14-2020

Something along the lines of:

from arcpy import da
from itertools import dropwhile

fc = # path to feature class
flds =  # list of fields
orderby = # sql order by statement for sorting
rand_row = # row number to start cursor

with da.SearchCursor(fc, flds, sql_clause=(None,orderby)) as cur:
    for row in dropwhile(lambda x: x[0] < rand_row, enumerate(cur, 1)):
        print(row)
    
    cur.reset()
    for row in cur:
        print(row)

BlakeTerhune · ‎12-14-2020

I've never heard of itertools.dropwhile(). Just another clever tool I'll have to remember for the future!

BrookeHodge · ‎12-15-2020

Thank for your reply and suggestion! I did try this and I was getting an index error thrown. My guess is this solution would work if I better understood the dropwhile() and then probably could have trouble shoot my error to get it to work. I ended up trying BlakeTerhune suggestion which worked. But I had never heard of itertools before and can see great utility in them for the future, so I'm glad you made the suggestion.