Python multiprocessing not shutting down child processes

23904
4
Jump to solution
03-30-2016 08:32 AM
FilipKrál
Occasional Contributor III

Hi all,

I wrote a Python script where I use multiprocessing.Pool.map to run a function on different parts of a large dataset in parallel (read only, results are stored in a separate directory for each process).

The python sub-processes produce the expected results but they never close down so I end up with many python.exe processes running forever (even after the "master" script has finished). Each of these processes consume around 100 MB of RAM and I think I will run out of memory when I run this script on a full set of input parameters. I've been running it with 6 CPU cores but I end up with many more python processes running.

How come they are not shutting down? Have you seen something like that before?

I'm using ArcGIS 10.3.1 with ArcGIS Pro 1.1 and Python 3.4 (64 bit) installed for stand alone scripts.

While my real script is much more complicated, this is essentially what I am doing:

import arcpy
import multiprocessing


def worker_function(pars)
    """Function to be run in parallel"""
    wd = pars.get('wd')
    # ... do some work with spatial analyst ...
    return {"result": ["spam", "eggs"]}


    # parameter sets is a list of dictionaries of primite Python strings
def main():
    parameter_sets = [
        # ...
        {
        "flow_direction_raster": '...',
        "weight_raster": '...',
        "output_raster": '...',
        "relative_to": '...',
        "extent": '...',
        "wd": '...'
        }
        # ...
    ]
    pool = multiprocessing.Pool(6)
    results = pool.map(worker_function, parameter_sets)
    return results

if __name__ == "__main__":
    main()
0 Kudos
1 Solution

Accepted Solutions
JoshuaBixby
MVP Esteemed Contributor

What about closing out the pool after the results are returned:

results = pool.map(worker_function, parameter_sets)
pool.close()
pool.join()

View solution in original post

4 Replies
DanPatterson_Retired
MVP Esteemed Contributor

hmmm rings a bell for a different situation try but I can't remember the details

results = main()  # to ensure that the results are returned from the main script

EDIT

a bit more difficult... see the programming guidelines for multiprocessing here

17.2. multiprocessing — Process-based parallelism — Python 3.4.4 documentation

0 Kudos
JoshuaBixby
MVP Esteemed Contributor

What about closing out the pool after the results are returned:

results = pool.map(worker_function, parameter_sets)
pool.close()
pool.join()
FilipKrál
Occasional Contributor III

Thank you, calling pool.close() and pool.join() did the trick.

F.

0 Kudos
DanPatterson_Retired
MVP Esteemed Contributor

further down in that section there is an example of how to use map

0 Kudos