I am trying to update field in a feature class using attributes from other feature classes. For this, I am currently leveraging spatial joins on feature layers, search cursors, and update cursors. The script runs successfully on small datasets within a few minutes but chugs on large ones. It seems as though this would be a good candidate script to use Richard Fairhurst's /blogs/richard_fairhurst/2014/11/08/turbo-charging-data-manipulation-with-python-cursors-and-diction... technique. Though, I am still getting a grasp on how to best use dictionaries to optimize script performance and am unsure of where I can best use these techniques here. Any suggestions on how to boost the speed of this script? Maybe there is another option out there to optimize it? The script is using three feature classes, subLayer, transLayer, and ppLayer to update new fields in the subLayer feature class based on spatial and attribute relationships. I have attached and commented a simplified version of the code here. Some of the blocks are ran multiple time for different attributes, and I have stripped that out here to slim down the code.
Thanks in advance! Any advice will be hugely appreciated.
import arcpy
from arcpy import env
import time
start = time.time()
arcpy.env.OverwriteOutput = True
defaultGdbPath = 'C:\Topo_Check_V5.gdb'
subLayer='C:\Topo_Check_V5.gdb\Subs' #point feature class
transLayer='C:\Topo_Check_V5.gdb\TLS' # line feature class
ppLayer='C:\Topo_Check_V5.gdb\PPS' #point feature class
###### Line Counts: Count of connecting lines within distance of point feature class ################
#add in Line_Count field
arcpy.AddField_management(subLayer, "Line_Count", "SHORT", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
TLineCountField = "Line_Count"
#MakeFeatureLayer
arcpy.MakeFeatureLayer_management(transLayer, "transLayer_lyr")
with arcpy.da.UpdateCursor (subLayer, [TLineCountField, "SHAPE@"]) as LineCountList:
for subrow in LineCountList:
arcpy.SelectLayerByLocation_management("transLayer_lyr", 'WITHIN_A_DISTANCE', subrow[1], .000002, "NEW_SELECTION")
result=int(arcpy.GetCount_management("transLayer_lyr").getOutput(0))
#print result
subrow[0] = result
LineCountList.updateRow(subrow)
print "Line_Count Updated"
del LineCountList
###### Line Counts 13.8 : Line counts based on attribute, repeated 9 more times for various attributes ###########
#add in Line_Count_13_8 field
arcpy.AddField_management(subLayer, "Line_Count_13_8", "SHORT", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
TLineCountField_13_8 = "Line_Count_13_8"
#VLTSField = "VLTS"
#Make feature layer
arcpy.MakeFeatureLayer_management(transLayer, "transLayer_13_8")
#where clause
where_13_8 = ' "VLTS" = 13.8 '
with arcpy.da.UpdateCursor (subLayer, [TLineCountField_13_8, "SHAPE@"]) as LineCountList_13_8:
for subrow in LineCountList_13_8:
arcpy.SelectLayerByLocation_management("transLayer_13_8", 'WITHIN_A_DISTANCE', subrow[1], .000002, "NEW_SELECTION")
arcpy.SelectLayerByAttribute_management ("transLayer_13_8", "SUBSET_SELECTION", where_13_8)
result=int(arcpy.GetCount_management("transLayer_13_8").getOutput(0))
#print result
subrow[0] = result
LineCountList_13_8.updateRow(subrow)
print "Line_Count_13_8 Updated"
del LineCountList_13_8
######################################################################################################################
############## *** repeat 'Line Counts' for all VLTSs ##########
############################## GU Count --count of interger field from ppLayer###############################
arcpy.AddField_management(subLayer, "GU_Count", "FLOAT", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
GUCountField = "GU_Count"
#Make feature layer
arcpy.MakeFeatureLayer_management(ppLayer, "ppLayerGU")
#Make feature layer
arcpy.MakeFeatureLayer_management(transLayer, "transLayerGU")
pp_trans_GUJoin ='C:\Topo_Check_V5.gdb\pp_trans_GU_SpatialJoin'
SubLayer_pp_trans_GUJoin = 'C:\Topo_Check_V5.gdb\SubLayer_pp_trans_GUSpatialJoin'
#spatial join of feature layers
arcpy.SpatialJoin_analysis ("transLayerGU", "ppLayerGU", pp_trans_GUJoin, 'JOIN_ONE_TO_MANY', 'KEEP_ALL', '#', 'WITHIN_A_DISTANCE', .0005)
#add new count field for distinct count for PP count
arcpy.AddField_management(pp_trans_GUJoin, "Join_Count_PP", "LONG", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
#calculate field == Join_Count
arcpy.CalculateField_management (pp_trans_GUJoin, "Join_Count_PP", "!Join_Count!", "PYTHON_9.3")
#Make feature layer
arcpy.MakeFeatureLayer_management(subLayer, "SubLayer_Layer")
#Make feature layer
arcpy.MakeFeatureLayer_management(pp_trans_GUJoin, "pp_trans_GUJoin_Layer")
# create a list of fields to sum from pp_trans_GUJoin
fieldNamesToSum = ['Join_Count','GEN_UNITS', 'Join_Count_PP']
# create the field mapping object
fieldMappings = arcpy.FieldMappings()
# populate the field mapping object with the fields from both feature classes
fieldMappings.addTable("SubLayer_Layer")
fieldMappings.addTable("pp_trans_GUJoin_Layer")
# loop through the field names to sum
for Join_Count in fieldNamesToSum:
# get the field map index of this field and get the field map
fieldIndex = fieldMappings.findFieldMapIndex('Join_Count')
fieldMap = fieldMappings.getFieldMap(fieldIndex)
# update the field map with the new merge rule (by default the merge rule is 'First')
fieldMap.mergeRule = 'Sum'
# replace with the updated field map
fieldMappings.replaceFieldMap(fieldIndex, fieldMap)
# loop through the field names to sum
for GEN_UNITS in fieldNamesToSum:
# get the field map index of this field and get the field map
fieldIndex = fieldMappings.findFieldMapIndex('GEN_UNITS')
fieldMap = fieldMappings.getFieldMap(fieldIndex)
# update the field map with the new merge rule (by default the merge rule is 'First')
fieldMap.mergeRule = 'Sum'
# replace with the updated field map
fieldMappings.replaceFieldMap(fieldIndex, fieldMap)
for Join_Count_PP in fieldNamesToSum:
# get the field map index of this field and get the field map
fieldIndex = fieldMappings.findFieldMapIndex('Join_Count_PP')
fieldMap = fieldMappings.getFieldMap(fieldIndex)
# update the field map with the new merge rule (by default the merge rule is 'First')
fieldMap.mergeRule = 'Max'
# replace with the updated field map
fieldMappings.replaceFieldMap(fieldIndex, fieldMap)
#spatial join of feature layers
arcpy.SpatialJoin_analysis ("SubLayer_Layer", "pp_trans_GUJoin_Layer", SubLayer_pp_trans_GUJoin, 'JOIN_ONE_TO_ONE', 'KEEP_ALL', fieldMappings, 'WITHIN_A_DISTANCE', .000002)
#Make feature layer
arcpy.MakeFeatureLayer_management(SubLayer_pp_trans_GUJoin, "JoinLayer")
with arcpy.da.UpdateCursor (subLayer, [GUCountField, "SHAPE@"]) as GUCalcCursor:
for subrow in GUCalcCursor:
arcpy.SelectLayerByLocation_management("JoinLayer", 'WITHIN_A_DISTANCE', subrow[1], .000002, "NEW_SELECTION")
if int(arcpy.GetCount_management(in_rows="JoinLayer").getOutput(0))==1:
with arcpy.da.SearchCursor("JoinLayer",['JOIN_COUNT','GEN_UNITS']) as JoinLayerCursor:
for joinsubrow in JoinLayerCursor:
GEN_UNITS = joinsubrow[1]
PPTLINES = joinsubrow[0]
if joinsubrow[1] > 0 and joinsubrow[0] > 0:
result = GEN_UNITS/PPTLINES
else:
result = None
#print result
subrow[0] = result
GUCalcCursor.updateRow(subrow)
####################### NULL or Zero Line Count ---Based off of 'Double' field in transLayer ##################
#add in NULL_ZERO_LINE_COUNT field
arcpy.AddField_management(subLayer, "NULL_ZERO_LINE_COUNT", "SHORT", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
FieldLineCountNULL_ZERO = "NULL_ZERO_LINE_COUNT"
#Make feature layer
arcpy.MakeFeatureLayer_management(transLayer, "transLayer_nullzero")
#where clause
NULL_ZERO_VLTS = ' "VLTS" is NULL or "VLTS" = 0 '
with arcpy.da.UpdateCursor (subLayer, [FieldLineCountNULL_ZERO, "SHAPE@"]) as NullZeroCountList:
for subrow in NullZeroCountList:
arcpy.SelectLayerByLocation_management("transLayer_nullzero", 'WITHIN_A_DISTANCE', subrow[1], .000002, "NEW_SELECTION")
arcpy.SelectLayerByAttribute_management ("transLayer_nullzero", "SUBSET_SELECTION",NULL_ZERO_VLTS)
result=int(arcpy.GetCount_management("transLayer_nullzero").getOutput(0))
#print result
subrow[0] = result
NullZeroCountList.updateRow(subrow)
print "NULL_ZERO_LINE_COUNT Updated"
####################### PP LINE COUNT: from transLayer, ppLayer, subLayer spatial join ##################
#add in PP_LINE_COUNT field
arcpy.AddField_management(subLayer, "PP_LINE_COUNT", "LONG", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
FieldPPCount = "PP_LINE_COUNT"
with arcpy.da.UpdateCursor (subLayer, [FieldPPCount, "SHAPE@"]) as ppLineCountList:
for subrow in ppLineCountList:
arcpy.SelectLayerByLocation_management("JoinLayer", 'WITHIN_A_DISTANCE', subrow[1], .000002, "NEW_SELECTION")
# JOIN_COUNT is sum of PPs joined to trans lines in the pp_trans_GUJoin spatial join
with arcpy.da.SearchCursor("JoinLayer",['JOIN_COUNT']) as JoinLayerppLineCountCursor:
for joinsubrow in JoinLayerppLineCountCursor:
JOIN_COUNT = joinsubrow[0]
result = JOIN_COUNT
subrow[0] = result
ppLineCountList.updateRow(subrow)
print "PP_LINE_COUNT Updated"
########################### PP_COUNT : from transLayer, ppLayer, subLayer spatial join ######################################
#add in PP_COUNT field
arcpy.AddField_management(subLayer, "PP_COUNT", "LONG", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
FieldPPCount = "PP_COUNT"
with arcpy.da.UpdateCursor (subLayer, [FieldPPCount, "SHAPE@"]) as ppCountList:
for subrow in ppCountList:
arcpy.SelectLayerByLocation_management("JoinLayer", 'WITHIN_A_DISTANCE', subrow[1], .000002, "NEW_SELECTION")
with arcpy.da.SearchCursor("JoinLayer",['JOIN_COUNT_PP']) as JoinLayerppCountCursor:
for joinsubrow in JoinLayerppCountCursor:
JOIN_COUNT_PP = joinsubrow[0]
result = JOIN_COUNT_PP
subrow[0] = result
ppCountList.updateRow(subrow)
print "PP_COUNT Updated"
################################################# unique subs by VLTS - 13.8 : based off of transLayer attributes and transLayer,subLayer spatial join ########################
#add in UNIQUE_SUB_COUNT_13_8 field
arcpy.AddField_management(subLayer, "UNIQUE_SUB_COUNT_13_8", "SHORT", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
UNIQUE_SUB_COUNT_13_8 = "UNIQUE_SUB_COUNT_13_8"
# Make feature layer
arcpy.MakeFeatureLayer_management(transLayer, "transLayer_uniquesubs")
#SUB1SUB2 = "SUB1SUB2"
arcpy.AddField_management("transLayer_uniquesubs", "SUB1SUB2", "TEXT", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
#SUB1SUB2_CalcField = ' "SUB_1" + ', ' + "SUB_2" + ', ' '
arcpy.CalculateField_management ("transLayer_uniquesubs", "SUB1SUB2", ' [SUB_1] + "," + [SUB_2] ')
#where clause
where_unique_13_8 = ' "VLTS" = 13.8 '
with arcpy.da.UpdateCursor (subLayer, [UNIQUE_SUB_COUNT_13_8, "SHAPE@"]) as UNIQUE_SUB_COUNT_13_8_List:
for subrow in UNIQUE_SUB_COUNT_13_8_List:
arcpy.SelectLayerByLocation_management("transLayer_uniquesubs", 'WITHIN_A_DISTANCE', subrow[1], .000002, "NEW_SELECTION")
arcpy.SelectLayerByAttribute_management ("transLayer_uniquesubs", "SUBSET_SELECTION", where_unique_13_8)
arcpy.DeleteIdentical_management("transLayer_uniquesubs", ['SUB1SUB2'])
result=int(arcpy.GetCount_management("transLayer_uniquesubs").getOutput(0))
#print result
subrow[0] = result
UNIQUE_SUB_COUNT_13_8_List.updateRow(subrow)
print "UNIQUE_SUB_COUNT_13_8 Updated"
##############repeat 'unique subs by VLTS' for all 9 VLTSs ##########
################################################# unique VLTS : based off of transLayer attributes and transLayer,subLayer spatial join ########################
#add in UNIQUE_VLTS_COUNT field
arcpy.AddField_management(subLayer, "UNIQUE_VLTS_COUNT", "SHORT", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
UNIQUE_VLTS_COUNT= "UNIQUE_VLTS_COUNT"
# Make feature layer
arcpy.MakeFeatureLayer_management(transLayer, "transLayer_uniqueVLTSs")
with arcpy.da.UpdateCursor (subLayer, [UNIQUE_VLTS_COUNT, "SHAPE@"]) as UNIQUE_VLTS_COUNT_List:
for subrow in UNIQUE_VLTS_COUNT_List:
arcpy.SelectLayerByLocation_management("transLayer_uniqueVLTSs", 'WITHIN_A_DISTANCE', subrow[1], .000002, "NEW_SELECTION")
arcpy.DeleteIdentical_management("transLayer_uniqueVLTSs", ['VLTS'])
result=int(arcpy.GetCount_management("transLayer_uniqueVLTSs").getOutput(0))
#print result
subrow[0] = result
UNIQUE_VLTS_COUNT_List.updateRow(subrow)
print "UNIQUE_VLTS_COUNT Updated"
#########################################################################
end = time.time()
print (end - start)/60
There are 300+ lines of code here, a bit complicated to suggest ideas...
I do notice multiple spatial selects inside loops. This will add to the execution time.
Have you tried using a near table instead, just do it once in other words.
I'll look into the near table option for trimming down the time consumed by the spatial selects. Could definitely be something to consider. Thanks! My apologies for the long, dense script.
I agree with Neil. I don't like using layer selections inside of da.cursor loops. Also, nested da.cursor loops is probably making this script run slow. Try getting a layer selection, then running an update/search cursor on that selection.
If I include the arcpy.SelectByLocation_management layer selection outside of the update cursor for the first block of the code, the Line_Count field is updated by the script in 34 minutes vs. 80 minutes. Thanks for this advice! However this is still a time consuming update. Any other suggestions for how to optimize this? The subLayer feature class contains 7000 records, transLayer contains 7500 records, and ppLayer contains 800 records; so these aren't massive feature classes, though they are continually growing.
import arcpy
from arcpy import env
import time
start = time.time()
arcpy.env.OverwriteOutput = True
defaultGdbPath = 'C:\Topo_Check_V5.gdb'
subLayer='C:\Topo_Check_V5.gdb\Subs' #point feature class
transLayer='C:\Topo_Check_V5.gdb\TLS' # line feature class
ppLayer='C:\Topo_Check_V5.gdb\PPS' #point feature class
###### Line Counts: Count of connecting lines within distance of point feature class ################
#add in Line_Count field
arcpy.AddField_management(subLayer, "Line_Count", "SHORT", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
TLineCountField = "Line_Count"
#MakeFeatureLayer
arcpy.MakeFeatureLayer_management(transLayer, "transLayer_lyr")
arcpy.SelectLayerByLocation_management("transLayer_lyr", 'WITHIN_A_DISTANCE', subLayer, .000002, "NEW_SELECTION")
with arcpy.da.UpdateCursor (subLayer, [TLineCountField, "SHAPE@"]) as LineCountList:
for subrow in LineCountList:
#arcpy.SelectLayerByLocation_management("transLayer_lyr", 'WITHIN_A_DISTANCE', subrow[1], .000002, "NEW_SELECTION")
result=int(arcpy.GetCount_management("transLayer_lyr").getOutput(0))
#print result
subrow[0] = result
LineCountList.updateRow(subrow)
print "Line_Count Updated"
del LineCountList
end = time.time()
print (end - start)/60
Try this, maybe:
Remove the variable for the field, and just put it directly in the cursor:
with arcpy.da.UpdateCursor (subLayer, ["LineCount", "SHAPE@"]) as LineCountList:
Create a layer out of the Sub FC, then call it within the cursor?
#MakeFeatureLayer
arcpy.MakeFeatureLayer_management(subLayer, "subLayer_lyr")
arcpy.MakeFeatureLayer_management(transLayer, "transLayer_lyr")
arcpy.SelectLayerByLocation_management("transLayer_lyr", 'WITHIN_A_DISTANCE', "subLayer_lyr", .000002, "NEW_SELECTION")
with arcpy.da.UpdateCursor ("subLayer_lyr", [TLineCountField, "SHAPE@"]) as LineCountList:
for subrow in LineCountList:
result=int(arcpy.GetCount_management("transLayer_lyr").getOutput(0))
#print result
subrow[0] = result
LineCountList.updateRow(subrow)
print "Line_Count Updated"
del LineCountList
Lastly, you're not using the "SHAPE@" field within the cursor now, you can remove that field from the cursor.
It's weird that this script is taking over 30 minutes to run... Is this is entire script?
This is the entirety of the non-commented out code that I am running now. I altered it to include your suggestions (Thank you so much by the way!), and it is running at nearly the exact same speed for the Line_Count field update. It is running from a network connection vs. the C:\ drive, so this could be adding to the execution time. Though, it still seems to be taking a bit longer than I had anticipated/hoped, especially for a single field calculation.
import arcpy
from arcpy import env
import time
start = time.time()
arcpy.env.OverwriteOutput = True
defaultGdbPath = 'C:\Topo_Check_V5.gdb'
subLayer='C:\Topo_Check_V5.gdb\Subs' #point feature class
transLayer='C:\Topo_Check_V5.gdb\TLS' # line feature class
ppLayer='C:\Topo_Check_V5.gdb\PPS' #point feature class
#add in Line_Count field
arcpy.AddField_management(subLayer, "Line_Count", "SHORT", "", "", "", "", "NULLABLE", "NON_REQUIRED", "")
#TLineCountField = "Line_Count"
#MakeFeatureLayer
arcpy.MakeFeatureLayer_management(subLayer, "subLayer_lyr")
arcpy.MakeFeatureLayer_management(transLayer, "transLayer_lyr")
arcpy.SelectLayerByLocation_management("transLayer_lyr", 'WITHIN_A_DISTANCE', "subLayer_lyr", .000002, "NEW_SELECTION")
with arcpy.da.UpdateCursor ("subLayer_lyr", ["Line_Count"]) as LineCountList:
for subrow in LineCountList:
result=int(arcpy.GetCount_management("transLayer_lyr").getOutput(0))
#print result
subrow[0] = result
LineCountList.updateRow(subrow)
print "Line_Count Updated"
del LineCountList
end = time.time()
print (end - start)/60
I would try not running it on the network to see if you get faster results.
Also, try putting the result variable outside of the cursor/row loop.
#MakeFeatureLayer
arcpy.MakeFeatureLayer_management(subLayer, "subLayer_lyr")
arcpy.MakeFeatureLayer_management(transLayer, "transLayer_lyr")
arcpy.SelectLayerByLocation_management("transLayer_lyr", 'WITHIN_A_DISTANCE', "subLayer_lyr", .000002, "NEW_SELECTION")
result=int(arcpy.GetCount_management("transLayer_lyr").getOutput(0))
arcpy.CalculateField_management("subLayer_lyr", "Line_Count", "%s" %result, "PYTHON_9.3", "")
##with arcpy.da.UpdateCursor ("subLayer_lyr", ["Line_Count"]) as LineCountList:
## for subrow in LineCountList:
## #print result
## subrow[0] = result
## LineCountList.updateRow(subrow)
##print "Line_Count Updated"
##del LineCountList
Since, If I'm reading it correctly, you're only updating the 'Line_Count' field with a single value. And since the field is not dependent on other variables or exceptions, you can use the CalculateField tool... which will hopefully be quicker.
Give that a try.
This certainly speeds it up (total time of 40 seconds on the network), but it is populating the Line_Count field (for subLayer vs. subLayer_lyr) with the total count of TransLayer features vs. the total count of TransLayer features within the specified distance of each feature in subLayer.
The SelectByLocation process above only selects where the Trans_Lyr intersects the Sub_Lyr at a specified distance. If you need to get a count of Trans_Lyr features that are within a distance of each feature in the Sub_Lyr, you will have to put it inside of a loop. I think you should iterate through the Sub_Lyr features to make separate layers for each unique feature.
This is the only way I know to do this:
with arcpy.da.UpdateCursor ("subLayer_lyr", ["Line_Count"]) as LineCountList:
for subrow in LineCountList:
for i in IDs:
arcpy.MakeFeatureLayer_management(transLayer, "transLayer_lyr", """Unique_ID = '%s'""" %i)
arcpy.SelectLayerByLocation_management("transLayer_lyr", 'WITHIN_A_DISTANCE', "subLayer_lyr", .000002, "NEW_SELECTION")
result=int(arcpy.GetCount_management("transLayer_lyr").getOutput(0))
#print result
subrow[0] = result
LineCountList.updateRow(subrow)
arcpy.Delete("transLayer_lyr")
print "Line_Count Updated"
del LineCountList
I apologize, I didn't quite understand the original question.