AnsweredAssumed Answered

In Python, what is the best library to do parallel or concurrent processing of GIS workflows?

Question asked by ldwight on Jul 7, 2020
Latest reply on Jul 7, 2020 by jcscott

I often do big data analytics where the PC or server I use is unable to handle conventional data processing workflows. Or it just takes too long to process. I work in an environment and have a home PC where I have multiple cores and decent size RAM.

 

To get big data projects done, I manually run separate instances of Python scripts (e.g., 10 separate python files running at the same time) on input data that I have partitioned, and concatenate the results on the back-end once all models are complete.

 

Is there a Python library that is built for parallel or concurrent processing, and works well with ArcPy? If so, does ESRI have a "help page" to show users how to perform these kind of workflows? For example, last week I ran an origin-destination travel-time matrix for all hospitals in US to every Census Block Group Centroid within 2.5 hours. To get the data to process below the computing capacity of the PC I was using, I had 15 separate Python files that looped through the block groups within each county and did the following: 1) created an o-d matrix layer, 2) add destination locations, 3) add origin locations of Block Groups within a county, 4) solved the model, 5) export/append the "lines" layer to a table in a database, 6) truncate the origin and lines layers, 7) and repeat steps 3-6 for the Block Groups in the next county.

 

Though solutions like this gets the job done for me, at times it can be kind of an headache to manage the concurrent processing of 10+ separate python files. I feel like that there's got to be a better solution out there.

 

Any advice is welcomed.

Outcomes