How to parallel process ArcGIS Pro objects, such as layouts?

uasdev · ‎10-29-2024

I am testing parallel processing of arcpy using Python's multiprocessing package, however, I am running into issues when trying to serialize ArcGIS Pro objects. I was wondering if anyone could suggest a better option to my workflow.

My current workflow is to loop through each layout in a .aprx, extract some info from map frames (i.e. data layers, broken links, etc.) and make some changes to layout (i.e. relink map frame on bottom right to a global map). At the moment, I use `arcpy.mp.ArcGISProject` in my worker function to open a project, select a specific layout, then extract info for that layout. This works but seems to take pretty long (~ 12-15 minutes for a batch of 10 layouts on a .aprx).

Below is test code I hope to get running, which mimics my desired workflow/code.

import os, sys, tempfile
import multiprocessing as mp
import arcpy
arcpy.env.overwriteOutput = True

def execute(layouts):

    # Init pool
    pool = mp.Pool(processes=mp.cpu_count() - 1, maxtasksperchild=10)

    # Use apply_async from multiprocessing
    jobs={}
    print("starting multiprocessing...")
    for lyt in layouts:
        jobs[lyt.name]=pool.apply_async(worker_function, [lyt])
    pool.close()
    pool.join()

    for lyt_name,result in jobs.items():
        try:
            result = result.get()
            print(result)
        except Exception as e:
            print('{}\n{}'.format(lyt_name, repr(e)))

    # > HCTTiv11027v04 NOCAL Range Complex
    # > TypeError("cannot pickle 'MappingLayoutObject' object")

    # Below does not work either
    # from concurrent.futures import ProcessPoolExecutor
    # print("starting concurrent futures...")
    # with ProcessPoolExecutor() as executor:
    #     results = executor.map(worker_function, layouts)
    # print(results)
    # for r in results:
    #     print(r)

def worker_function(in_lyt):

    # Do some "fake" work
    print("worker function on :", in_lyt.name)
    return f"worker function on: {in_lyt.name}"

if __name__=='__main__':
    # Get params
    fdir = r"C:\path\to\dir"
    aprx_fn = "test.aprx"
    print(aprx_fn)

    # open aprx
    aprx_proj = arcpy.mp.ArcGISProject(os.path.join(fdir, aprx_fn))
    # get a list of layouts to pass as param to function
    layoutList = []
    print("layouts:")
    for lyt in aprx_proj.listLayouts():
        print(lyt.name)
        layoutList.append(lyt)
    print("executing...")
    execute(layoutList)

Thanks in advance

DavidSolari · ‎10-29-2024

Multiprocessing spins up entire Python runtimes and passes arguments from the current runtime into them. This necessitates data serialization via pickle, and virtually none of the arcpy objects that refer to non-Python data structures meet that requirement. You'll need to redesign your worker tasks to open their own copy of the project to extract the layout, which might work, but could also lead to locking issues that necessitate further fixes. In general, nothing in arcpy was designed to handle more than one thread in one process, so don't be surprised if you have to abandon multiprocessing.

View solution in original post

DavidSolari · ‎10-29-2024

Multiprocessing spins up entire Python runtimes and passes arguments from the current runtime into them. This necessitates data serialization via pickle, and virtually none of the arcpy objects that refer to non-Python data structures meet that requirement. You'll need to redesign your worker tasks to open their own copy of the project to extract the layout, which might work, but could also lead to locking issues that necessitate further fixes. In general, nothing in arcpy was designed to handle more than one thread in one process, so don't be surprised if you have to abandon multiprocessing.

uasdev · ‎10-29-2024

Ah, that makes sense. I figured this may be the case 😂...

I appreciate your response!