Multiprocessing with ArcPy

Document created by bchastain on Jan 6, 2016Last modified by bchastain on Jan 6, 2016
Version 2Show Document
  • View in full screen mode

The Problem

Anyone who's had to do any sort of scripting with arcpy involving complex repetitive tasks knows how long it can take for it to complete. Even something as simple as exporting a few thousand map documents to images can potentially take hours to run. Part of the reason for this is due to the fact that python, by default, runs in a single thread. A programmer familiar with other languages, such as Java or C++ might have the bright idea to address this inefficiency by using multithreading to share the work among all the CPUs. Unfortunately, multithreading in python does not work in quite the same fashion.

 

CPython, upon which arcpy is built, employs what is known as the Global Interpreter Lock (GIL) to prevent one thread from modifying an object while another thread is working with that object, keeping garbage collection working smoothly. However, the downside to this is that python multithreading scripts cannot take full advantage of all available CPUs, preventing any real gains in speed.

 

The Solution

Enter the multiprocessing module. This module employs processes instead of threads, thereby bypassing the GIL. Due to this, programmers are able to fully leverage multiple processors on any given machine, and even though processes have greater overhead than threads, significant gains in speed can be achieved.

 

This presentation will show best practices for using the multiprocessing module with arcpy, and will show examples of how we are currently using multithreading to provide solutions to clients.

 

Co-authors:

Brett GainesKelvin Fox

2 people found this helpful

Attachments

    Outcomes