Hi all,
I wrote a Python script where I use multiprocessing.Pool.map to run a function on different parts of a large dataset in parallel (read only, results are stored in a separate directory for each process).
The python sub-processes produce the expected results but they never close down so I end up with many python.exe processes running forever (even after the "master" script has finished). Each of these processes consume around 100 MB of RAM and I think I will run out of memory when I run this script on a full set of input parameters. I've been running it with 6 CPU cores but I end up with many more python processes running.
How come they are not shutting down? Have you seen something like that before?
I'm using ArcGIS 10.3.1 with ArcGIS Pro 1.1 and Python 3.4 (64 bit) installed for stand alone scripts.
While my real script is much more complicated, this is essentially what I am doing:
import arcpy import multiprocessing def worker_function(pars) """Function to be run in parallel""" wd = pars.get('wd') # ... do some work with spatial analyst ... return {"result": ["spam", "eggs"]} # parameter sets is a list of dictionaries of primite Python strings def main(): parameter_sets = [ # ... { "flow_direction_raster": '...', "weight_raster": '...', "output_raster": '...', "relative_to": '...', "extent": '...', "wd": '...' } # ... ] pool = multiprocessing.Pool(6) results = pool.map(worker_function, parameter_sets) return results if __name__ == "__main__": main()
Solved! Go to Solution.
What about closing out the pool after the results are returned:
results = pool.map(worker_function, parameter_sets) pool.close() pool.join()
hmmm rings a bell for a different situation try but I can't remember the details
results = main() # to ensure that the results are returned from the main script
EDIT
a bit more difficult... see the programming guidelines for multiprocessing here
17.2. multiprocessing — Process-based parallelism — Python 3.4.4 documentation
What about closing out the pool after the results are returned:
results = pool.map(worker_function, parameter_sets) pool.close() pool.join()
Thank you, calling pool.close() and pool.join() did the trick.
F.
further down in that section there is an example of how to use map