Has anyone used a python profiler on python code in an atbx?

DuncanHornby · ‎09-25-2023

I recently discovered some software called scalene that is supposed to examine the efficiency of your python code. I managed to get it to work with a simple "stand alone" python script which I show below:

from scalene import scalene_profiler

scalene_profiler.start()

for i in [1,2,3,4,5,6,7,8,9]:
    print(i*10)

scalene_profiler.stop()

You get scalene to work by running it from command line, this I show below:

C:\Users\hornb\AppData\Local\ESRI\conda\envs\arcgispro-py3_Spyder>python.exe -m scalene C:\Scratch\testscalene.py

I'm developing some rather meaty python tools that are embedded in an atbx toolbox which require the user to run the tools from within ArcPro as they need to choose layers from a map. I've not used a profiler before and can't see a way to use it if the python code is embedded in a toolbox and you some how need to call scalene from a command line prompt.

I was wondering if any ArcPro python developers out there have tried to use such software and got any advice\pointers\pearls of wisdom?

DavidSolari · ‎09-28-2023

Clone the toolbox, export the relevant embedded scripts to Python files, rewrite the scripts so the critical paths are functions you can call with standard Python data as parameters, then throw your profiler at that. E.g., if you have a script like:

import arcpy
table = arcpy.GetParameterAsText(0)
output = arcpy.GetParameterAsText(1)

table_data = get_table_data(table)
new_data = transform_data(table_data)
write_data(new_data, output)

Export that script and change it to something like:

import arcpy
from scalene import scalene_profiler

def main(table, output):
    table_data = get_table_data(table)
    new_data = transform_data(table_data)
    write_data(new_data, output)

if __name__ == "__main__":
    scalene_profiler.start()
    main("/path/to/input", "/path/to/output")
    scalene_profiler.stop()

And so on. The more complex your process is, the more functions you'll need to factor out so you can find the true critical path. You might also need to tweak your functions to read and write preformatted data from memory that you prepare before the profiler engages, that way you eliminate I/O operations from the benchmark as much as possible; this makes finding slow computations easier. Note that for most arcpy code, the true bottleneck is the synchronous I/O functions that arcpy provides so don't be surprised if your code runs several orders of magnitude faster with the I/O stubbed out. Happy hunting!

View solution in original post

DanPatterson · ‎09-25-2023

I don't use scalene, but if it is like any other profiler, don't use print statements when you are trying to profile and profile function calls (eg def's)

... sort of retired...

DavidSolari · ‎09-28-2023

Clone the toolbox, export the relevant embedded scripts to Python files, rewrite the scripts so the critical paths are functions you can call with standard Python data as parameters, then throw your profiler at that. E.g., if you have a script like:

import arcpy
table = arcpy.GetParameterAsText(0)
output = arcpy.GetParameterAsText(1)

table_data = get_table_data(table)
new_data = transform_data(table_data)
write_data(new_data, output)

Export that script and change it to something like:

import arcpy
from scalene import scalene_profiler

def main(table, output):
    table_data = get_table_data(table)
    new_data = transform_data(table_data)
    write_data(new_data, output)

if __name__ == "__main__":
    scalene_profiler.start()
    main("/path/to/input", "/path/to/output")
    scalene_profiler.stop()

And so on. The more complex your process is, the more functions you'll need to factor out so you can find the true critical path. You might also need to tweak your functions to read and write preformatted data from memory that you prepare before the profiler engages, that way you eliminate I/O operations from the benchmark as much as possible; this makes finding slow computations easier. Note that for most arcpy code, the true bottleneck is the synchronous I/O functions that arcpy provides so don't be surprised if your code runs several orders of magnitude faster with the I/O stubbed out. Happy hunting!

DuncanHornby · ‎10-02-2023

Thanks for the advice, this was the approach I was hoping to avoid! 😁

Interesting stuff about I/O.