Select to view content in your preferred language

In Python, what is the best library to do parallel or concurrent processing of GIS workflows?

980
1
07-07-2020 11:25 AM
DwightLewis3
New Contributor

I often do big data analytics where the PC or server I use is unable to handle conventional data processing workflows. Or it just takes too long to process. I work in an environment and have a home PC where I have multiple cores and decent size RAM.

To get big data projects done, I manually run separate instances of Python scripts (e.g., 10 separate python files running at the same time) on input data that I have partitioned, and concatenate the results on the back-end once all models are complete.

Is there a Python library that is built for parallel or concurrent processing, and works well with ArcPy? If so, does ESRI have a "help page" to show users how to perform these kind of workflows? For example, last week I ran an origin-destination travel-time matrix for all hospitals in US to every Census Block Group Centroid within 2.5 hours. To get the data to process below the computing capacity of the PC I was using, I had 15 separate Python files that looped through the block groups within each county and did the following: 1) created an o-d matrix layer, 2) add destination locations, 3) add origin locations of Block Groups within a county, 4) solved the model, 5) export/append the "lines" layer to a table in a database, 6) truncate the origin and lines layers, 7) and repeat steps 3-6 for the Block Groups in the next county.

Though solutions like this gets the job done for me, at times it can be kind of an headache to manage the concurrent processing of 10+ separate python files. I feel like that there's got to be a better solution out there.

Any advice is welcomed.

0 Kudos
1 Reply
CodyScott
Regular Contributor

Awhile back i had helped someone by making this gist. Its using the multiprocessing library to do parallel calculations on a data set.

This is generally the setup i use, specifically queue based so you are always trying to let every process pull a value from a shared pool.

Multiprocessing ArcGIS/Arcpy · GitHub 

If i recall i had changed it slightly to not use queue.empty and just wrapped it in a try:catch: instead to avoid some other issues. that one doesn't show it, but it's easy to modify.

Feel free to ask if you have any questions on it.

update:

arcpy_multiprocessing_template/Toolbox/code at master · namur007/arcpy_multiprocessing_template · Gi... 

This one should have the proper logic

0 Kudos