Python Script & Performance Speed

ChrisHolmes · ‎02-05-2018

Hello,

I have the following script. It works as expected, but it takes around 1.5 seconds to run each time. Doesn't seem to matter if it's running against 20 parcels or 5000. Are there any suggestions as to how I can improve the performance of this script.

Thanks

import arcpy
import numpy as np

try:
    arcpy.env.overwriteOutput = True
    #Create in memory tables:
    arcpy.CreateTable_management("in_memory", "tableSelRecs")
    arcpy.AddField_management(r'in_memory\tableSelRecs', "USE_CATEGORY", "TEXT", field_length=35)
    arcpy.AddField_management(r'in_memory\tableSelRecs', "SHAPE_Area", "Double")
    arcpy.CreateTable_management("in_memory", "tableSumRecs")
    arcpy.AddField_management(r'in_memory\tableSumRecs', "USE_CATEGORY", "TEXT", field_length=35)
    arcpy.AddField_management(r'in_memory\tableSumRecs', "SHAPE_Area", "Double")

    mxd = arcpy.mapping.MapDocument('CURRENT')
    df = mxd.activeDataFrame
    totalArea = 0
    totalResArea = 0
    totalParcels = 0
    totalResParcels = 0

    lu_layer = arcpy.mapping.ListLayers(mxd, "LandUse_UseCategory", df)[0]
    #use da.SearchCursor to access necessary fields from selected records
    with arcpy.da.SearchCursor(lu_layer,['USE_CATEGORY','SHAPE@AREA']) as cursor:
        #Setup insert cursor to insert selected polygons into tableSelRecs
        insCursor = arcpy.da.InsertCursor(r'in_memory\tableSelRecs', ['USE_CATEGORY','SHAPE_Area'])
        #Isert selected rows into insCursor;
        #Get total area of selected records;
        #Get a count of total parcels
        for row in cursor:
            if row[0] != 'Invalid':
                insCursor.insertRow((row[0],row[1]))
                totalArea += row[1]
                totalParcels += 1
            #If the block is of a 'Residential' USE_CATEGORY type then add that value into the Residential DI calculation variable
            if row[0] in('Residential - Low Density','Residential - Medium Density','Residential - High Density'):
                totalResArea += row[1]
                totalResParcels += 1

        #use summarystatistics against tableSelRecs to get totals per USE_CATEGORY and write results to tableSumRecs
        arcpy.Statistics_analysis(r'in_memory\tableSelRecs', r'in_memory\tableSumRecs', [["SHAPE_Area", "SUM"]], "USE_CATEGORY")

    with arcpy.da.SearchCursor(r'in_memory\tableSumRecs',['USE_CATEGORY','SUM_SHAPE_Area']) as selCur:
        #list to store calculation for each USE_CATEGORY value used in DI calculation
        interimValue = [0]
        #list to store calculation for each USE_CATEGORY value used in RDI calculation
        interimResValue = [0] 
        for sRow in selCur:
            arcpy.AddMessage('USE_CATEGORY: {}; AREA: {}'.format(sRow[0],sRow[1]))
            #divide the USE_CATEGORY area by the total community area then square that number & append it to the list
            interimValue.append(np.square(sRow[1]/totalArea)) 
            if sRow[0] in('Residential - Low Density','Residential - Medium Density','Residential - High Density'):
                interimResValue.append(np.square(sRow[1]/totalResArea))

        #sum the values in the list then subtract that value from 1 to get the DI for the selected blocks
        DI = 1 - sum(interimValue)
        #sum the values in the list then subtract that value from 1 to get the RDI for the selected blocks
        RDI = 1 - sum(interimResValue) 
        
        arcpy.AddMessage('Diversity Index: ' + str(DI))
        arcpy.AddMessage('Residential Diversity Index: ' + str(RDI))
        arcpy.AddMessage('Total Blocks Selected: ' + str(totalParcels))
        arcpy.AddMessage('Total Blocks Area: ' + str(totalArea))
        arcpy.AddMessage('Total Res Blocks Selected: ' + str(totalResParcels))
        arcpy.AddMessage('Total Res Blocks Area: ' + str(totalResArea))
        arcpy.AddMessage('NOTE: Any blocks with a use category of Invalid are not included in any calculations.')
    #Delete in memory tables
    arcpy.Delete_management(r'in_memory\tableSelRecs')
    arcpy.Delete_management(r'in_memory\tableSumRecs')

except arcpy.ExecuteError:
    print(arcpy.GetMessages(2))
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

DanPatterson_Retired · ‎02-05-2018

Lots of 'speed' issues this week, so

software, version
machine type and operating system
ram, if you are using Pro or 64 bit geoprocessing in arcmap
is other software running?
are you monitoring resources?
is everything indexed?
same performance in_memory vs to disk?
had you considered collecting in the searchcursor, then processing in a separate insertcursor ?(would take some jigging), this seems to be a not-recommended by some

Suggestions

break everything into functions (def)
you can use a 'decorator' or profiler on a def to see where the bottleneck is
you don't need numpy.
throw timing steps and print statements if you don't want to use a decorator
dump the try-except, there is no need since the errors that are returned are readily identifiable unless they are obscure

more questions, but this will be a start.

View solution in original post

DanPatterson_Retired · ‎02-05-2018

Lots of 'speed' issues this week, so

software, version
machine type and operating system
ram, if you are using Pro or 64 bit geoprocessing in arcmap
is other software running?
are you monitoring resources?
is everything indexed?
same performance in_memory vs to disk?
had you considered collecting in the searchcursor, then processing in a separate insertcursor ?(would take some jigging), this seems to be a not-recommended by some

Suggestions

break everything into functions (def)
you can use a 'decorator' or profiler on a def to see where the bottleneck is
you don't need numpy.
throw timing steps and print statements if you don't want to use a decorator
dump the try-except, there is no need since the errors that are returned are readily identifiable unless they are obscure

more questions, but this will be a start.

ChrisHolmes · ‎02-05-2018

Thanks Dan for the comments and suggestions.

BlakeTerhune · ‎02-16-2018

Did any of those suggestions in particular make a significant difference in the performance of your script?

ChrisHolmes · ‎02-16-2018

Hi Blake, yes I found that removing the reference to the mxd file and instead using parameters to pass in the feature layer improved overall performance down to around 0.7 seconds. I think the calculations perform very fast it’s just the overhead getting there.

JakeSkinner · ‎02-05-2018

but it takes around 1.5 seconds to run each time

Is that correct? Are you looking to have it run even faster than 1.5 seconds?

DanPatterson_Retired · ‎02-05-2018

Good point...

What IDE are you using?

Some have the option to do 'module reloading' everytime a script is run. Check your ide to see.

That will affect timing a whole load, and you will need a decorator to time function (I can provide if you do a lot of testing)

ChrisHolmes · ‎02-05-2018

I have no idea what a decorator is.

ChrisHolmes · ‎02-05-2018

A good point. It just seems odd to me that it's no faster when run with a small number of parcels.

DanPatterson_Retired · ‎02-05-2018

Chris... for decorators, you need to have functions... here is an example script setup, that basically returns a list of numbers. I put sleepy time in there to get a meaningful time back. The if __name__ == '__main__' section calls and runs the script. if your ide uses IPython (or others) you can use things like %timeit to run lines of code or functions etc. Decorators are selective in that you can choose to 'decorate' just the functions that you want (line 20 is the 'decorator' kind of like a special line above a function)

def time_deco(func):  # timing originally
    """timing decorator function
    :print("\n  print results inside wrapper or use <return> ... ")
    """
    import time
    from functools import wraps

    @wraps(func)
    def wrapper(*args, **kwargs):
        t0 = time.perf_counter()        # start time
        result = func(*args, **kwargs)  # ... run the function ...
        t1 = time.perf_counter()        # end time
        dt = t1 - t0
        print("\nTiming function for... {}".format(func.__name__))
        print("  Time: {: <8.2e}s for {:,} objects".format(dt, len(result)))
        return result                   # return the result of the function
        return dt                       # return delta time
    return wrapper

@time_deco
def _demo():
    """
    : -
    """
    import time
    time.sleep(1)
    result = [1, 2, 3]
    return result

# ----------------------------------------------------------------------
# __main__ .... code section
if __name__ == "__main__":
    """Optionally...
    : - print the script source name.
    : - run the _demo
    """
#    print("Script... {}".format(script))
    result = _demo()‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Now if you don't want a full-fledged timer, examine lines 10-13, you can start, stop and get time differences and print the result out.. just like the decorator. perfcounter is the preferred counter for windows.