Python Script & Performance Speed

4209
20
Jump to solution
02-05-2018 10:25 AM
ChrisHolmes
Occasional Contributor III

Hello,

I have the following script. It works as expected, but it takes around 1.5 seconds to run each time. Doesn't seem to matter if it's running against 20 parcels or 5000. Are there any suggestions as to how I can improve the performance of this script.

Thanks

import arcpy
import numpy as np

try:
    arcpy.env.overwriteOutput = True
    #Create in memory tables:
    arcpy.CreateTable_management("in_memory", "tableSelRecs")
    arcpy.AddField_management(r'in_memory\tableSelRecs', "USE_CATEGORY", "TEXT", field_length=35)
    arcpy.AddField_management(r'in_memory\tableSelRecs', "SHAPE_Area", "Double")
    arcpy.CreateTable_management("in_memory", "tableSumRecs")
    arcpy.AddField_management(r'in_memory\tableSumRecs', "USE_CATEGORY", "TEXT", field_length=35)
    arcpy.AddField_management(r'in_memory\tableSumRecs', "SHAPE_Area", "Double")

    mxd = arcpy.mapping.MapDocument('CURRENT')
    df = mxd.activeDataFrame
    totalArea = 0
    totalResArea = 0
    totalParcels = 0
    totalResParcels = 0

    lu_layer = arcpy.mapping.ListLayers(mxd, "LandUse_UseCategory", df)[0]
    #use da.SearchCursor to access necessary fields from selected records
    with arcpy.da.SearchCursor(lu_layer,['USE_CATEGORY','SHAPE@AREA']) as cursor:
        #Setup insert cursor to insert selected polygons into tableSelRecs
        insCursor = arcpy.da.InsertCursor(r'in_memory\tableSelRecs', ['USE_CATEGORY','SHAPE_Area'])
        #Isert selected rows into insCursor;
        #Get total area of selected records;
        #Get a count of total parcels
        for row in cursor:
            if row[0] != 'Invalid':
                insCursor.insertRow((row[0],row[1]))
                totalArea += row[1]
                totalParcels += 1
            #If the block is of a 'Residential' USE_CATEGORY type then add that value into the Residential DI calculation variable
            if row[0] in('Residential - Low Density','Residential - Medium Density','Residential - High Density'):
                totalResArea += row[1]
                totalResParcels += 1

        #use summarystatistics against tableSelRecs to get totals per USE_CATEGORY and write results to tableSumRecs
        arcpy.Statistics_analysis(r'in_memory\tableSelRecs', r'in_memory\tableSumRecs', [["SHAPE_Area", "SUM"]], "USE_CATEGORY")

    with arcpy.da.SearchCursor(r'in_memory\tableSumRecs',['USE_CATEGORY','SUM_SHAPE_Area']) as selCur:
        #list to store calculation for each USE_CATEGORY value used in DI calculation
        interimValue = [0]
        #list to store calculation for each USE_CATEGORY value used in RDI calculation
        interimResValue = [0] 
        for sRow in selCur:
            arcpy.AddMessage('USE_CATEGORY: {}; AREA: {}'.format(sRow[0],sRow[1]))
            #divide the USE_CATEGORY area by the total community area then square that number & append it to the list
            interimValue.append(np.square(sRow[1]/totalArea)) 
            if sRow[0] in('Residential - Low Density','Residential - Medium Density','Residential - High Density'):
                interimResValue.append(np.square(sRow[1]/totalResArea))

        #sum the values in the list then subtract that value from 1 to get the DI for the selected blocks
        DI = 1 - sum(interimValue)
        #sum the values in the list then subtract that value from 1 to get the RDI for the selected blocks
        RDI = 1 - sum(interimResValue) 
        
        arcpy.AddMessage('Diversity Index: ' + str(DI))
        arcpy.AddMessage('Residential Diversity Index: ' + str(RDI))
        arcpy.AddMessage('Total Blocks Selected: ' + str(totalParcels))
        arcpy.AddMessage('Total Blocks Area: ' + str(totalArea))
        arcpy.AddMessage('Total Res Blocks Selected: ' + str(totalResParcels))
        arcpy.AddMessage('Total Res Blocks Area: ' + str(totalResArea))
        arcpy.AddMessage('NOTE: Any blocks with a use category of Invalid are not included in any calculations.')
    #Delete in memory tables
    arcpy.Delete_management(r'in_memory\tableSelRecs')
    arcpy.Delete_management(r'in_memory\tableSumRecs')

except arcpy.ExecuteError:
    print(arcpy.GetMessages(2))
Tags (3)
0 Kudos
20 Replies
ChrisHolmes
Occasional Contributor III

Ok, thanks Dan. I see how that could help me identify any bottleneck.

0 Kudos
ChrisHolmes
Occasional Contributor III

So I talked to our server guys as the plan is to have this script published as a gp service which I can then use in web appbuilder. They told me that 1.5 seconds is very good. So maybe I should move forward with having the script published. I'm thankful for the comments here as I have taken away some good learnings and some pointers to help me improve the script. thanks

0 Kudos
ChrisSnyder
Regular Contributor III

Why is there a need to write data to the in_memory tables? Seems like the stats you are calculating only rely on read data. Perhaps a dictionary could serve the purpose of you intermediate in_memory tables?

Significant bottlenecks I see are:

1. Creating the intermediate tables, adding fields to them, and inserting values

2. Writing data to numpy arrays from the cursors. Seems like the np.square()  calculations are entirely doable outside of numpy. Certainly more convenient in np though.

3. Using arcpy.mapping to get a hook into the layer. Maybe you could just have a toolbox w/ an input feature layer variable that had a default value set to your desired layer name?

0 Kudos
ChrisHolmes
Occasional Contributor III

Why is there a need to write data to the in_memory tables? Seems like the stats you are calculating only rely on read data. Perhaps a dictionary could serve the purpose of you intermediate in_memory tables?

I believe when I first started working on this script (and created a different geonet thread) the suggestions were to use intermediate tables with summary statistics or use a dictionary. I got the script working with summary statistics (with a plan later to try to get it working with a dictionary but as yet haven't got back to it).

Significant bottlenecks I see are:

1. Creating the intermediate tables, adding fields to them, and inserting values

Agreed.

2. Writing data to numpy arrays from the cursors. Seems like the np.square()  calculations are entirely doable outside of numpy. Certainly more convenient in np though.

Agree on both points.

3. Using arcpy.mapping to get a hook into the layer. Maybe you could just have a toolbox w/ an input feature layer variable that had a default value set to your desired layer name?

This is a change that I have made as I had the python toolbox published as a gp service to use as a geoprocessing widget in web appbuilder and without inputs and outputs it's basically useless. So making this change of using inputs/outputs and removing the need to hook into the layer has brought the script run time down to 0.3 - 0.4 seconds.

Thanks for your input Chris

ChrisSnyder
Regular Contributor III

Well that's good news... Yeah, I'm finding that some of the arcpy.mapping functions aren't the fastest horses on the track. Onwards and upwards!

0 Kudos
DanPatterson_Retired
MVP Emeritus

actually  np.sqrt takes no more time than math.sqrt ... both measure in micro seconds.  

If you want to speed up numeric calculations on fields for large tables (ie many records)  pull the field out, use numpy to vectorize the calculation then use arcpy.da.extendtable to add the result back..

JoshuaBixby
MVP Esteemed Contributor

When you say it takes 1.5 seconds each time, is that the time when freshly initialized and ran, or does it take that long even if run successively in the same process/session?

Depending on which ArcPy components are being used, initializing/loading all of them into a clean Python process/session make take a second.  You might be seeing the same times between 20 and 5000 parcels because the tool runs so quick that the time you are seeing is the overhead of initializing objects and libraries.

If you are going to time your functions, it might be worth timing how long it takes for your import statements as well.

ChrisHolmes
Occasional Contributor III

When you say it takes 1.5 seconds each time, is that the time when freshly initialized and ran, or does it take that long even if run successively in the same process/session?

It does run a little faster on subsequent runs.

Depending on which ArcPy components are being used, initializing/loading all of them into a clean Python process/session make take a second.  You might be seeing the same times between 20 and 5000 parcels because the tool runs so quick that the time you are seeing is the overhead of initializing objects and libraries.

I think you are correct.

If you are going to time your functions, it might be worth timing how long it takes for your import statements as well.

Good point, would be interesting to see.

Thanks Joshua

0 Kudos
SørenNielsen
New Contributor

Hi Chris,

I have had a lot of success using arcpy.SetLogHistory(False) when updating feature classes in the SDE. It disables the logging of the geoprocessing tools. I think you can improve the speed of your script a little. But 1.5 seconds is very fast!

Best regards

Søren

0 Kudos
ChrisHolmes
Occasional Contributor III

Thanks for the tip Soren. Ya, I’m happy with the performance now. Moved on with getting it working in the Web AppBuilder environment.

0 Kudos