Python Script & Performance Speed

ChrisHolmes · ‎02-05-2018

Hello,

I have the following script. It works as expected, but it takes around 1.5 seconds to run each time. Doesn't seem to matter if it's running against 20 parcels or 5000. Are there any suggestions as to how I can improve the performance of this script.

Thanks

import arcpy
import numpy as np

try:
    arcpy.env.overwriteOutput = True
    #Create in memory tables:
    arcpy.CreateTable_management("in_memory", "tableSelRecs")
    arcpy.AddField_management(r'in_memory\tableSelRecs', "USE_CATEGORY", "TEXT", field_length=35)
    arcpy.AddField_management(r'in_memory\tableSelRecs', "SHAPE_Area", "Double")
    arcpy.CreateTable_management("in_memory", "tableSumRecs")
    arcpy.AddField_management(r'in_memory\tableSumRecs', "USE_CATEGORY", "TEXT", field_length=35)
    arcpy.AddField_management(r'in_memory\tableSumRecs', "SHAPE_Area", "Double")

    mxd = arcpy.mapping.MapDocument('CURRENT')
    df = mxd.activeDataFrame
    totalArea = 0
    totalResArea = 0
    totalParcels = 0
    totalResParcels = 0

    lu_layer = arcpy.mapping.ListLayers(mxd, "LandUse_UseCategory", df)[0]
    #use da.SearchCursor to access necessary fields from selected records
    with arcpy.da.SearchCursor(lu_layer,['USE_CATEGORY','SHAPE@AREA']) as cursor:
        #Setup insert cursor to insert selected polygons into tableSelRecs
        insCursor = arcpy.da.InsertCursor(r'in_memory\tableSelRecs', ['USE_CATEGORY','SHAPE_Area'])
        #Isert selected rows into insCursor;
        #Get total area of selected records;
        #Get a count of total parcels
        for row in cursor:
            if row[0] != 'Invalid':
                insCursor.insertRow((row[0],row[1]))
                totalArea += row[1]
                totalParcels += 1
            #If the block is of a 'Residential' USE_CATEGORY type then add that value into the Residential DI calculation variable
            if row[0] in('Residential - Low Density','Residential - Medium Density','Residential - High Density'):
                totalResArea += row[1]
                totalResParcels += 1

        #use summarystatistics against tableSelRecs to get totals per USE_CATEGORY and write results to tableSumRecs
        arcpy.Statistics_analysis(r'in_memory\tableSelRecs', r'in_memory\tableSumRecs', [["SHAPE_Area", "SUM"]], "USE_CATEGORY")

    with arcpy.da.SearchCursor(r'in_memory\tableSumRecs',['USE_CATEGORY','SUM_SHAPE_Area']) as selCur:
        #list to store calculation for each USE_CATEGORY value used in DI calculation
        interimValue = [0]
        #list to store calculation for each USE_CATEGORY value used in RDI calculation
        interimResValue = [0] 
        for sRow in selCur:
            arcpy.AddMessage('USE_CATEGORY: {}; AREA: {}'.format(sRow[0],sRow[1]))
            #divide the USE_CATEGORY area by the total community area then square that number & append it to the list
            interimValue.append(np.square(sRow[1]/totalArea)) 
            if sRow[0] in('Residential - Low Density','Residential - Medium Density','Residential - High Density'):
                interimResValue.append(np.square(sRow[1]/totalResArea))

        #sum the values in the list then subtract that value from 1 to get the DI for the selected blocks
        DI = 1 - sum(interimValue)
        #sum the values in the list then subtract that value from 1 to get the RDI for the selected blocks
        RDI = 1 - sum(interimResValue) 
        
        arcpy.AddMessage('Diversity Index: ' + str(DI))
        arcpy.AddMessage('Residential Diversity Index: ' + str(RDI))
        arcpy.AddMessage('Total Blocks Selected: ' + str(totalParcels))
        arcpy.AddMessage('Total Blocks Area: ' + str(totalArea))
        arcpy.AddMessage('Total Res Blocks Selected: ' + str(totalResParcels))
        arcpy.AddMessage('Total Res Blocks Area: ' + str(totalResArea))
        arcpy.AddMessage('NOTE: Any blocks with a use category of Invalid are not included in any calculations.')
    #Delete in memory tables
    arcpy.Delete_management(r'in_memory\tableSelRecs')
    arcpy.Delete_management(r'in_memory\tableSumRecs')

except arcpy.ExecuteError:
    print(arcpy.GetMessages(2))
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

ChrisHolmes · ‎02-05-2018

Ok, thanks Dan. I see how that could help me identify any bottleneck.

ChrisHolmes · ‎02-05-2018

So I talked to our server guys as the plan is to have this script published as a gp service which I can then use in web appbuilder. They told me that 1.5 seconds is very good. So maybe I should move forward with having the script published. I'm thankful for the comments here as I have taken away some good learnings and some pointers to help me improve the script. thanks

ChrisSnyder · ‎02-07-2018

Why is there a need to write data to the in_memory tables? Seems like the stats you are calculating only rely on read data. Perhaps a dictionary could serve the purpose of you intermediate in_memory tables?

Significant bottlenecks I see are:

1. Creating the intermediate tables, adding fields to them, and inserting values

2. Writing data to numpy arrays from the cursors. Seems like the np.square() calculations are entirely doable outside of numpy. Certainly more convenient in np though.

3. Using arcpy.mapping to get a hook into the layer. Maybe you could just have a toolbox w/ an input feature layer variable that had a default value set to your desired layer name?

ChrisHolmes · ‎02-08-2018

Why is there a need to write data to the in_memory tables? Seems like the stats you are calculating only rely on read data. Perhaps a dictionary could serve the purpose of you intermediate in_memory tables?

I believe when I first started working on this script (and created a different geonet thread) the suggestions were to use intermediate tables with summary statistics or use a dictionary. I got the script working with summary statistics (with a plan later to try to get it working with a dictionary but as yet haven't got back to it).

Significant bottlenecks I see are:

1. Creating the intermediate tables, adding fields to them, and inserting values

Agreed.

2. Writing data to numpy arrays from the cursors. Seems like the np.square() calculations are entirely doable outside of numpy. Certainly more convenient in np though.

Agree on both points.

3. Using arcpy.mapping to get a hook into the layer. Maybe you could just have a toolbox w/ an input feature layer variable that had a default value set to your desired layer name?

This is a change that I have made as I had the python toolbox published as a gp service to use as a geoprocessing widget in web appbuilder and without inputs and outputs it's basically useless. So making this change of using inputs/outputs and removing the need to hook into the layer has brought the script run time down to 0.3 - 0.4 seconds.

Thanks for your input Chris

ChrisSnyder · ‎02-08-2018

Well that's good news... Yeah, I'm finding that some of the arcpy.mapping functions aren't the fastest horses on the track. Onwards and upwards!

DanPatterson_Retired · ‎02-08-2018

actually np.sqrt takes no more time than math.sqrt ... both measure in micro seconds.

If you want to speed up numeric calculations on fields for large tables (ie many records) pull the field out, use numpy to vectorize the calculation then use arcpy.da.extendtable to add the result back..

JoshuaBixby · ‎02-08-2018

When you say it takes 1.5 seconds each time, is that the time when freshly initialized and ran, or does it take that long even if run successively in the same process/session?

Depending on which ArcPy components are being used, initializing/loading all of them into a clean Python process/session make take a second. You might be seeing the same times between 20 and 5000 parcels because the tool runs so quick that the time you are seeing is the overhead of initializing objects and libraries.

If you are going to time your functions, it might be worth timing how long it takes for your import statements as well.

ChrisHolmes · ‎02-08-2018

When you say it takes 1.5 seconds each time, is that the time when freshly initialized and ran, or does it take that long even if run successively in the same process/session?

It does run a little faster on subsequent runs.

Depending on which ArcPy components are being used, initializing/loading all of them into a clean Python process/session make take a second. You might be seeing the same times between 20 and 5000 parcels because the tool runs so quick that the time you are seeing is the overhead of initializing objects and libraries.

I think you are correct.

If you are going to time your functions, it might be worth timing how long it takes for your import statements as well.

Good point, would be interesting to see.

Thanks Joshua

SørenNielsen · ‎02-16-2018

Hi Chris,

I have had a lot of success using arcpy.SetLogHistory(False) when updating feature classes in the SDE. It disables the logging of the geoprocessing tools. I think you can improve the speed of your script a little. But 1.5 seconds is very fast!

Best regards

Søren

ChrisHolmes · ‎02-16-2018

Thanks for the tip Soren. Ya, I’m happy with the performance now. Moved on with getting it working in the Web AppBuilder environment.