Hello,
I have the following script. It works as expected, but it takes around 1.5 seconds to run each time. Doesn't seem to matter if it's running against 20 parcels or 5000. Are there any suggestions as to how I can improve the performance of this script.
Thanks
import arcpy
import numpy as np
try:
arcpy.env.overwriteOutput = True
#Create in memory tables:
arcpy.CreateTable_management("in_memory", "tableSelRecs")
arcpy.AddField_management(r'in_memory\tableSelRecs', "USE_CATEGORY", "TEXT", field_length=35)
arcpy.AddField_management(r'in_memory\tableSelRecs', "SHAPE_Area", "Double")
arcpy.CreateTable_management("in_memory", "tableSumRecs")
arcpy.AddField_management(r'in_memory\tableSumRecs', "USE_CATEGORY", "TEXT", field_length=35)
arcpy.AddField_management(r'in_memory\tableSumRecs', "SHAPE_Area", "Double")
mxd = arcpy.mapping.MapDocument('CURRENT')
df = mxd.activeDataFrame
totalArea = 0
totalResArea = 0
totalParcels = 0
totalResParcels = 0
lu_layer = arcpy.mapping.ListLayers(mxd, "LandUse_UseCategory", df)[0]
#use da.SearchCursor to access necessary fields from selected records
with arcpy.da.SearchCursor(lu_layer,['USE_CATEGORY','SHAPE@AREA']) as cursor:
#Setup insert cursor to insert selected polygons into tableSelRecs
insCursor = arcpy.da.InsertCursor(r'in_memory\tableSelRecs', ['USE_CATEGORY','SHAPE_Area'])
#Isert selected rows into insCursor;
#Get total area of selected records;
#Get a count of total parcels
for row in cursor:
if row[0] != 'Invalid':
insCursor.insertRow((row[0],row[1]))
totalArea += row[1]
totalParcels += 1
#If the block is of a 'Residential' USE_CATEGORY type then add that value into the Residential DI calculation variable
if row[0] in('Residential - Low Density','Residential - Medium Density','Residential - High Density'):
totalResArea += row[1]
totalResParcels += 1
#use summarystatistics against tableSelRecs to get totals per USE_CATEGORY and write results to tableSumRecs
arcpy.Statistics_analysis(r'in_memory\tableSelRecs', r'in_memory\tableSumRecs', [["SHAPE_Area", "SUM"]], "USE_CATEGORY")
with arcpy.da.SearchCursor(r'in_memory\tableSumRecs',['USE_CATEGORY','SUM_SHAPE_Area']) as selCur:
#list to store calculation for each USE_CATEGORY value used in DI calculation
interimValue = [0]
#list to store calculation for each USE_CATEGORY value used in RDI calculation
interimResValue = [0]
for sRow in selCur:
arcpy.AddMessage('USE_CATEGORY: {}; AREA: {}'.format(sRow[0],sRow[1]))
#divide the USE_CATEGORY area by the total community area then square that number & append it to the list
interimValue.append(np.square(sRow[1]/totalArea))
if sRow[0] in('Residential - Low Density','Residential - Medium Density','Residential - High Density'):
interimResValue.append(np.square(sRow[1]/totalResArea))
#sum the values in the list then subtract that value from 1 to get the DI for the selected blocks
DI = 1 - sum(interimValue)
#sum the values in the list then subtract that value from 1 to get the RDI for the selected blocks
RDI = 1 - sum(interimResValue)
arcpy.AddMessage('Diversity Index: ' + str(DI))
arcpy.AddMessage('Residential Diversity Index: ' + str(RDI))
arcpy.AddMessage('Total Blocks Selected: ' + str(totalParcels))
arcpy.AddMessage('Total Blocks Area: ' + str(totalArea))
arcpy.AddMessage('Total Res Blocks Selected: ' + str(totalResParcels))
arcpy.AddMessage('Total Res Blocks Area: ' + str(totalResArea))
arcpy.AddMessage('NOTE: Any blocks with a use category of Invalid are not included in any calculations.')
#Delete in memory tables
arcpy.Delete_management(r'in_memory\tableSelRecs')
arcpy.Delete_management(r'in_memory\tableSumRecs')
except arcpy.ExecuteError:
print(arcpy.GetMessages(2))
Solved! Go to Solution.
Ok, thanks Dan. I see how that could help me identify any bottleneck.
So I talked to our server guys as the plan is to have this script published as a gp service which I can then use in web appbuilder. They told me that 1.5 seconds is very good. So maybe I should move forward with having the script published. I'm thankful for the comments here as I have taken away some good learnings and some pointers to help me improve the script. thanks
Why is there a need to write data to the in_memory tables? Seems like the stats you are calculating only rely on read data. Perhaps a dictionary could serve the purpose of you intermediate in_memory tables?
Significant bottlenecks I see are:
1. Creating the intermediate tables, adding fields to them, and inserting values
2. Writing data to numpy arrays from the cursors. Seems like the np.square() calculations are entirely doable outside of numpy. Certainly more convenient in np though.
3. Using arcpy.mapping to get a hook into the layer. Maybe you could just have a toolbox w/ an input feature layer variable that had a default value set to your desired layer name?
Why is there a need to write data to the in_memory tables? Seems like the stats you are calculating only rely on read data. Perhaps a dictionary could serve the purpose of you intermediate in_memory tables?
I believe when I first started working on this script (and created a different geonet thread) the suggestions were to use intermediate tables with summary statistics or use a dictionary. I got the script working with summary statistics (with a plan later to try to get it working with a dictionary but as yet haven't got back to it).
Significant bottlenecks I see are:
1. Creating the intermediate tables, adding fields to them, and inserting values
Agreed.
2. Writing data to numpy arrays from the cursors. Seems like the np.square() calculations are entirely doable outside of numpy. Certainly more convenient in np though.
Agree on both points.
3. Using arcpy.mapping to get a hook into the layer. Maybe you could just have a toolbox w/ an input feature layer variable that had a default value set to your desired layer name?
This is a change that I have made as I had the python toolbox published as a gp service to use as a geoprocessing widget in web appbuilder and without inputs and outputs it's basically useless. So making this change of using inputs/outputs and removing the need to hook into the layer has brought the script run time down to 0.3 - 0.4 seconds.
Thanks for your input Chris
Well that's good news... Yeah, I'm finding that some of the arcpy.mapping functions aren't the fastest horses on the track. Onwards and upwards!
actually np.sqrt takes no more time than math.sqrt ... both measure in micro seconds.
If you want to speed up numeric calculations on fields for large tables (ie many records) pull the field out, use numpy to vectorize the calculation then use arcpy.da.extendtable to add the result back..
When you say it takes 1.5 seconds each time, is that the time when freshly initialized and ran, or does it take that long even if run successively in the same process/session?
Depending on which ArcPy components are being used, initializing/loading all of them into a clean Python process/session make take a second. You might be seeing the same times between 20 and 5000 parcels because the tool runs so quick that the time you are seeing is the overhead of initializing objects and libraries.
If you are going to time your functions, it might be worth timing how long it takes for your import statements as well.
When you say it takes 1.5 seconds each time, is that the time when freshly initialized and ran, or does it take that long even if run successively in the same process/session?
It does run a little faster on subsequent runs.
Depending on which ArcPy components are being used, initializing/loading all of them into a clean Python process/session make take a second. You might be seeing the same times between 20 and 5000 parcels because the tool runs so quick that the time you are seeing is the overhead of initializing objects and libraries.
I think you are correct.
If you are going to time your functions, it might be worth timing how long it takes for your import statements as well.
Good point, would be interesting to see.
Thanks Joshua
Hi Chris,
I have had a lot of success using arcpy.SetLogHistory(False) when updating feature classes in the SDE. It disables the logging of the geoprocessing tools. I think you can improve the speed of your script a little. But 1.5 seconds is very fast!
Best regards
Søren
Thanks for the tip Soren. Ya, I’m happy with the performance now. Moved on with getting it working in the Web AppBuilder environment.