Python multiprocessing not shutting down child processes

FilipKrál · ‎03-30-2016

Hi all,

I wrote a Python script where I use multiprocessing.Pool.map to run a function on different parts of a large dataset in parallel (read only, results are stored in a separate directory for each process).

The python sub-processes produce the expected results but they never close down so I end up with many python.exe processes running forever (even after the "master" script has finished). Each of these processes consume around 100 MB of RAM and I think I will run out of memory when I run this script on a full set of input parameters. I've been running it with 6 CPU cores but I end up with many more python processes running.

How come they are not shutting down? Have you seen something like that before?

I'm using ArcGIS 10.3.1 with ArcGIS Pro 1.1 and Python 3.4 (64 bit) installed for stand alone scripts.

While my real script is much more complicated, this is essentially what I am doing:

import arcpy
import multiprocessing


def worker_function(pars)
    """Function to be run in parallel"""
    wd = pars.get('wd')
    # ... do some work with spatial analyst ...
    return {"result": ["spam", "eggs"]}


    # parameter sets is a list of dictionaries of primite Python strings
def main():
    parameter_sets = [
        # ...
        {
        "flow_direction_raster": '...',
        "weight_raster": '...',
        "output_raster": '...',
        "relative_to": '...',
        "extent": '...',
        "wd": '...'
        }
        # ...
    ]
    pool = multiprocessing.Pool(6)
    results = pool.map(worker_function, parameter_sets)
    return results

if __name__ == "__main__":
    main()

JoshuaBixby · ‎03-30-2016

What about closing out the pool after the results are returned:

results = pool.map(worker_function, parameter_sets)
pool.close()
pool.join()

View solution in original post

DanPatterson_Retired · ‎03-30-2016

hmmm rings a bell for a different situation try but I can't remember the details

results = main() # to ensure that the results are returned from the main script

EDIT

a bit more difficult... see the programming guidelines for multiprocessing here

17.2. multiprocessing — Process-based parallelism — Python 3.4.4 documentation

JoshuaBixby · ‎03-30-2016

What about closing out the pool after the results are returned:

results = pool.map(worker_function, parameter_sets)
pool.close()
pool.join()

FilipKrál · ‎03-30-2016

Thank you, calling pool.close() and pool.join() did the trick.

F.

DanPatterson_Retired · ‎03-30-2016

further down in that section there is an example of how to use map