Using multiple CPUs in Python script

CarstenSchuermann · ‎10-31-2021

Hi, I have developed a Python script that iterates through a list of feature classes, in order to apply the same series of geoprocessing tools upon each feature class. In the script, I implemented it like this:

fclist = ("fc1","fc2","fc3", ...")

for i in fclist:

arcpy.management.AddField(i, ...)

arcpy.management.CalculateField(i, ...)

...

It works fine, but I am wondering whether it is possible to speed up processing time by telling Pythin to use not just one but multiple CPUs (for example, use 50% of the available CPUs), i.e. how can I tell Python to perform the "for" loop in parallel?

As the result of the "for" loop for one feature class is not input to or dependend on the results of the other feature classes, I think parallel processing should be possible.

All feature classes are located in the same feature datasets, so I am furthermore wondering whether I am running into "system lock" problems if I am going to parallise this processing?

I appreciate of someone could share his experiences with multi-core / multip-CPU processing and could provide some code snippets.

DanPatterson · ‎10-31-2021

Did you see the documentation on cpu and gpu support?

Processor Type (Environment setting)—ArcGIS Pro | Documentation

Parallel Processing Factor (Environment setting)—ArcGIS Pro | Documentation

env—ArcGIS Pro | Documentation

... sort of retired...

DonMorrison1 · ‎10-31-2021

I've done quite a lot of this and from what you describe it should work with no problem using the python multiprocessing package. The pseudocode below shows the basic flow. As long as each process only touches its own feature class there should be no locking problem. My computer has 8 threads and I usually max it out at 8, there is some odd pleasure in seeing the CPU meters all running at 100%. Throughput increases typically about 5 times what you get with a single processor. Even at 100% I can still use my system to do lightweight interactive work (eg. browsing web sites).

import multiprocessing

p = multiprocessing.Pool(<number of processors>)
p.map(<target function>, <list of feature class names>)
p.close() 

def <target_function> (feature_class_name):
    ... your code to process one feature class
    return

One final note - I HAVE had some problems to get this to work running in a python toolbox, but no problem in a standalone script.

And another final note - you need to be careful with naming conflicts if you are creating temporary feature classes as part of the processing. I usually solve this by appending something like the process ID to the name of the temporary object

CarstenSchuermann · ‎11-02-2021

Hi Don, thanks for the pseudo code and the hint about potential naming conflicts with temporary feature classes. I will try it out. So far I have no intention to develop a python toolbox, a script should be perfect for my purposes.

Anonymous User · ‎10-31-2021

Mine is similar to DonMorrisons, but also assigns the pythonw.exe to use and returns some info from the tasks:

def function_To_Process(fc, etc args):
    try:
        # Do stuff to the fc
        return {'result': True}

    except Exception as ex:
        return { 'result': False, 'error': ex}

if __name__ == '__main__':
    ctx = multiprocessing.get_context('spawn')
    ctx.set_executable(r'path to your pythonw.exe')
    
    # create list of fc's/args etc that you want processed
    jobs = [(fc1,args), (fc2, args), (fc3, args), etc...)]

    cpuNum = ctx.cpu_count()
    # throttle cpu to 80% if wanted
    # cpuNum = int(math.ceil(cpuNum * (80 / 100)))

    with ctx.Pool(processes=cpuNum) as pool:
        res = pool.starmap(function_To_Process, jobs)

    if len(res) > 0:
        for result in res:
            print(f'Failed: {result.get("error", "None this time")}')

r

CarstenSchuermann · ‎11-02-2021

Thank you for the additional remarks, Jeff.