UpdateCursor instead of SelectLayerByLocation and CalculateField? (arcpy)

JoshuaBailey1 · ‎03-05-2019

I am new to #arcpy and I'm making my first script tool. Can I use an UpdateCursor with a SHAPE@ token and disjoint() instead of SelectLayerByLocation_management() followed by CalculateField_management()? Consider the code below, in which inputcampus is a polygon and in which lineoutput is a line feature class. I want to identify lines that cross inputcampus and for which (startcampus==0 and endcampus==0). Then I could categorize a trip as 'offcampus' if it started and ended off campus without ever entering campus or 'cross' if it starts off campus, goes on campus, and then ends off campus.

with arcpy.da.UpdateCursor(lineoutput, cursorfields) as cursor:
  for row in cursor:
#   the order of these variables is determined by the cursorfields list
#   start_time=       row[0]
#   completed_time=   row[1]
#   trip_minutes=     row[2]
#   distance_meters=  row[3]  
#   startcampus=      row[4]
#   endcampus=        row[5]
#   triptype=         row[6]
#   metersperminute=  row[7]
#   milesperhour=     row[8]
#   SHAPE@ (geometry) row[9]
    tripminutes= round(((parser.parse(row[1]) - parser.parse(row[0])).total_seconds() / 60), 2)
    row[2]= tripminutes
    if tripminutes < 1:            row[6]= 'zerotime'
    elif row[3]==0:                row[6]= 'zerodistance'
    elif row[4]==1 and row[5]==1:  row[6]= 'oncampus'
    elif row[4]==1 and row[5]==0:  row[6]= 'outgoing'
    elif row[4]==0 and row[5]==1:  row[6]= 'incoming'
    elif row[4]==0 and row[5]==0:  row[6]= 'offcampus'
#     Because I don't know the proper syntax for the following, I did it outside of the cursor in lines 93-97
#     This is sure to be less efficient so I hope to do it in the cursor instead
#     row[9] uses a SHAPE@ token
#   elif row[4]==0 and row[5]==0:
#     if row[9].disjoint(inputcampus):
#       row[6]= 'offcampus'
#     else:
#       row[6]= 'cross'

    row[7]= round(( row[3] / tripminutes), 2)            # meters per minute, rounded to 2 decimal places
    row[8]= round(((row[3] / tripminutes) / 26.8224), 2) # (meters per minute) -> (miles per hour)
    cursor.updateRow(row)

# I'd rather do the following in the cursor above, so the script is more efficient
arcpy.SelectLayerByAttribute_management('pylinelyr', "NEW_SELECTION", "startcampus = 0 AND endcampus = 0")
arcpy.SelectLayerByLocation_management( 'pylinelyr', "INTERSECT", inputcampus, selection_type="SUBSET_SELECTION")
arcpy.CalculateField_management('pylinelyr', 'triptype', 'cross')
arcpy.SelectLayerByAttribute_management('pylinelyr', 'CLEAR_SELECTION')

I hope this makes sense. For context (and if you'd like to help me optimize my first script tool), here's the full code as it currently is written:

# This script was made to map and process bicycle trip data (WGS 1984) from CSV files with expected structure (see script parameters)
# Joshua Bailey, University of Central Florida, 3/5/2019

from dateutil import parser
import sys, arcpy

# Store parameters as variables to make code more manageable and readable
# Be sure ArcGIS tool dialog is configured in the correct order:
triptable=       sys.argv[1]  # input table/CSV containing trip data
inputcampus=     sys.argv[2]  # input campus polygon
startoutput=     sys.argv[3]  # start points output
endoutput=       sys.argv[4]  # end points output
lineoutput=      sys.argv[5]  # line output
tripid=          sys.argv[6]  # default= trip_id from triptable
distancemeters=  sys.argv[7]  # default= distance_meters from triptable
starttime=       sys.argv[8]  # default= start_time from triptable
completedtime=   sys.argv[9]  # default= completed_time from triptable
startlong=       sys.argv[10] # default= start_longitude from triptable
startlat=        sys.argv[11] # default= start_latitude from triptable
endlong=         sys.argv[12] # default= end_longitude from triptable
endlat=          sys.argv[13] # default= end_latitude from triptable

# Make feature classes from table/CSV
arcpy.MakeXYEventLayer_management(triptable, startlong, startlat, 'startXYlayer')
arcpy.MakeXYEventLayer_management(triptable, endlong,   endlat,   'endXYlayer')
arcpy.CopyFeatures_management('startXYlayer', startoutput)   # I think the event layers need to be exported so they have OIDs
arcpy.CopyFeatures_management('endXYlayer',   endoutput)
arcpy.MakeFeatureLayer_management(startoutput, 'pystartlyr') # I think these layers have to be initialized for SelectLayerByLocation_management() 
arcpy.MakeFeatureLayer_management(endoutput,   'pyendlyr')
arcpy.MakeFeatureLayer_management(lineoutput,  'pylinelyr')
arcpy.XYToLine_management(triptable, lineoutput, startlong, startlat, endlong, endlat, id_field=tripid)

# Add fields
arcpy.AddField_management(lineoutput,    'trip_minutes',    'FLOAT')
arcpy.AddField_management(lineoutput,    'metersperminute', 'FLOAT')
arcpy.AddField_management(lineoutput,    'milesperhour',    'FLOAT')
arcpy.AddField_management('pystartlyr',  'startcampus',     'SHORT')
arcpy.AddField_management('pyendlyr',    'endcampus',       'SHORT')
arcpy.AddField_management(lineoutput,    'triptype',        'TEXT')

# Mark start points within polygon: 1=within polygon; 0=not within polygon
arcpy.SelectLayerByLocation_management('pystartlyr', 'WITHIN', inputcampus)
arcpy.CalculateField_management('pystartlyr', 'startcampus', '1')
arcpy.SelectLayerByAttribute_management('pystartlyr', 'SWITCH_SELECTION')
arcpy.CalculateField_management('pystartlyr', 'startcampus', '0')
arcpy.SelectLayerByAttribute_management('pystartlyr', 'CLEAR_SELECTION')

arcpy.SelectLayerByLocation_management('pyendlyr',   'WITHIN', inputcampus)
arcpy.CalculateField_management('pyendlyr',   'endcampus',   '1')
arcpy.SelectLayerByAttribute_management('pyendlyr',   'SWITCH_SELECTION')
arcpy.CalculateField_management('pyendlyr',   'endcampus',   '0')
arcpy.SelectLayerByAttribute_management('pyendlyr',   'CLEAR_SELECTION')

arcpy.JoinField_management(lineoutput, tripid, startoutput, tripid, [starttime, completedtime, distancemeters, 'startcampus'])
arcpy.JoinField_management(lineoutput, tripid, endoutput,   tripid, 'endcampus')

cursorfields= [starttime, completedtime, 'trip_minutes', distancemeters, 'startcampus', 'endcampus', 'triptype', 'metersperminute', 'milesperhour', 'SHAPE@']

with arcpy.da.UpdateCursor(lineoutput, cursorfields) as cursor:
  for row in cursor:
#   the order of these variables is determined by the cursorfields list
#   start_time=       row[0]
#   completed_time=   row[1]
#   trip_minutes=     row[2]
#   distance_meters=  row[3]  
#   startcampus=      row[4]
#   endcampus=        row[5]
#   triptype=         row[6]
#   metersperminute=  row[7]
#   milesperhour=     row[8]
#   SHAPE@ (geometry) row[9]
    tripminutes= round(((parser.parse(row[1]) - parser.parse(row[0])).total_seconds() / 60), 2)
    row[2]= tripminutes
    if tripminutes < 1:            row[6]= 'zerotime'
    elif row[3]==0:                row[6]= 'zerodistance'
    elif row[4]==1 and row[5]==1:  row[6]= 'oncampus'
    elif row[4]==1 and row[5]==0:  row[6]= 'outgoing'
    elif row[4]==0 and row[5]==1:  row[6]= 'incoming'
    elif row[4]==0 and row[5]==0:  row[6]= 'offcampus'
#     Because I don't know the proper syntax for the following, I did it outside of the cursor in lines 93-97
#     This is sure to be less efficient so I hope to do it in the cursor instead
#     row[9] uses a SHAPE@ token
#   elif row[4]==0 and row[5]==0:  row[6]= 'offcampus'
#     if row[9].disjoint(inputcampus):
#       row[6]= 'offcampus'
#	  else:
#       row[6]= 'cross'

    row[7]= round(( row[3] / tripminutes), 2)            # meters per minute, rounded to 2 decimal places
    row[8]= round(((row[3] / tripminutes) / 26.8224), 2) # (meters per minute) -> (miles per hour)
    cursor.updateRow(row)

# I'd rather do the following in the cursor above, so the script is more efficient
arcpy.SelectLayerByAttribute_management('pylinelyr', "NEW_SELECTION", "startcampus = 0 AND endcampus = 0")
arcpy.SelectLayerByLocation_management( 'pylinelyr', "INTERSECT", inputcampus, selection_type="SUBSET_SELECTION")
arcpy.CalculateField_management('pylinelyr', 'triptype', 'cross')
arcpy.SelectLayerByAttribute_management('pylinelyr', 'CLEAR_SELECTION')

	
# Create new field holding special ArcGIS data type (to be used in time-enabled maps)
arcpy.ConvertTimeField_management(lineoutput, starttime,     'yyyy-MM-ddTHH:mm:ss.000+00:00', 'start_DATE', 'DATE')
arcpy.ConvertTimeField_management(lineoutput, completedtime, 'yyyy-MM-ddTHH:mm:ss.000+00:00', 'end_DATE',   'DATE')

# The join fields are redundant in the following so I made a list variable
# Each JoinField_management() will append one field to the list
pointsjoin= ['trip_minutes', 'metersperminute', 'milesperhour', 'triptype', 'start_DATE', 'end_DATE']
arcpy.JoinField_management(startoutput, tripid, lineoutput, tripid, pointsjoin+['endcampus'])
arcpy.JoinField_management(endoutput,   tripid, lineoutput, tripid, pointsjoin+['startcampus'])

Related: https://www.reddit.com/r/gis/comments/aw53a0/update_cursor_vs_calculate_field_speed/ https://community.esri.com/blogs/richard_fairhurst/2014/11/08/turbo-charging-data-manipulation-with-...

JoshuaBixby · ‎03-05-2019

The short answer is, yes, but I don't think it is worth the effort, unless this is a school exercise. Set-based geoprocessing operations are very efficient since the geoprocessing tools are compiled code. The geoprocessing tools also use spatial and attribute indexes when present on the data. I don't see how looping over a data set with a cursor and doing individual-based geoprocessing will reduce run times for what you are trying to accomplish.

Setting the should-you-do-it question aside, I am a bit confused by your data structure when looking at the two code snippets. Are your trips defined by start and end points only in a CSV table? If so, is the assumption that the trip was a straight line between the two points? Or, do you have a line feature class representing the trips?

JoshuaBailey1 · ‎03-05-2019

I thought it would be more efficient to include all of the updates into one cursor. The trips are defined only by start and end point in a CSV which also contains a trip distance measurement. The straight lines will only be useful for categorizing and visualizing the trips.

I should have mentioned that

My alternative approach would be a series of Select and Calculate calls. Would that be better?

JoshuaBixby · ‎03-05-2019

I would use XY To Line—Data Management toolbox | ArcGIS Desktop, then Select Layer By Location—Data Management toolbox | ArcGIS Desktop to find lines that intersect (or don't intersect) the campus, and finally Calculate Field—Data Management toolbox | ArcGIS Desktop. You can invert/switch the selection to finish attributing the data.

JoshuaBailey1 · ‎03-05-2019

For further context, I expect the script to be used monthly to process tens of thousands of new entries. Should I add indexes?