I have a script to process a bunch of polygons. The script uses nested ProcessPoolExecutors. After a certain point, I get this error:
Exception has occurred: RuntimeError (note: full exception trace is shown but execution is paused at: <module>)
The Product License has not been initialized.
File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\geoprocessing\_base.py", line 14, in <module>
import arcgisscripting
File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\geoprocessing\__init__.py", line 14, in <module>
from ._base import *
File "C:\Program Files\ArcGIS\Pro\Resources\ArcPy\arcpy\__init__.py", line 77, in <module>
from arcpy.geoprocessing import gp
File "C:\Users\redacted\example.py", line 1, in <module>
import arcpy
File "<string>", line 1, in <module> (Current frame)
RuntimeError: The Product License has not been initialized.
Things I have tried yet still run into this issue:
The only "solution" I've found is to have the total number of processes (max_concurrency * max_children) stay below the mid 20's even on machines with enough cores and ram.
Here is an example script:
import arcpy
import itertools
import uuid
import time
import concurrent.futures
import random
arcpy.env.overwriteOutput = True
arcpy.env.workspace = Point towards your Geodatabase
output = "test_number"
max_concurrency = 20
max_children = 20
def process_number(number):
time.sleep(random.random() * 3)
return number
def process_numbers_multi(number, num_range):
print(f"PROCESSING STARTED ON NUMBER: {number}")
nums = list(range(number, number+num_range))
input_handler = iter(nums)
num_results = []
with concurrent.futures.ProcessPoolExecutor(max_workers=max_children) as child_executor:
futures = {
child_executor.submit(process_number, part): part
for part in itertools.islice(input_handler, max_children)
}
while futures:
done, _ = concurrent.futures.wait(
futures, return_when=concurrent.futures.FIRST_COMPLETED
)
for fut in done:
original_input = futures.pop(fut)
try:
results = []
results = fut.result()
except Exception as exc:
print(f"{original_input} generated an exception: {exc}")
else:
num_results.append(results)
if any(input_handler):
for part in itertools.islice(input_handler, len(done)):
fut = child_executor.submit(process_number, part)
futures[fut] = (part)
return num_results
def create_feature_class():
# Create transect FC. Add Fields.
trans_fc = arcpy.management.CreateFeatureclass(out_path=arcpy.env.workspace,
out_name=output)
flds = [("NUMBER_GUID", "GUID"), ("NUMBER", "DOUBLE")]
for fld_name, fld_type in flds:
arcpy.management.AddField(in_table=trans_fc, field_name=fld_name,
field_type=fld_type, field_length=1)
print("CREATED TRANSECT FC")
return trans_fc
def write_to_db(number, trans_fc):
flds = ["NUMBER_GUID", "NUMBER"]
print(f"WRITING NUMBER")
for rows in number:
with arcpy.da.InsertCursor(trans_fc, flds) as icurs:
icurs.insertRow([rows[0],rows[1]])
def main():
startTime = time.time()
print("PROCESS STARTING")
trans_fc = create_feature_class()
num_list = list(range(0,200))
input_handler = iter(num_list)
num_range = 10
full_results = []
"""tmp = process_numbers_multi(num_list[0], num_range)
for i in tmp:
full_results.append([uuid.uuid4(), i])
write_to_db(full_results, trans_fc)"""
with concurrent.futures.ProcessPoolExecutor(max_workers=max_concurrency ) as executor:
futures = {
executor.submit(process_numbers_multi, part, num_range): part
for part in itertools.islice(input_handler, max_concurrency )
}
while futures:
done, _ = concurrent.futures.wait(
futures, return_when=concurrent.futures.FIRST_COMPLETED
)
for fut in done:
original_input = futures.pop(fut)
try:
results = []
results = fut.result()
except Exception as exc:
print(f"{original_input} generated an exception: {exc}")
else:
for x in results:
full_results.append([uuid.uuid4(), x])
write_to_db(full_results, trans_fc)
full_results = []
if any(input_handler):
for part in itertools.islice(input_handler, len(done)):
fut = executor.submit(process_numbers_multi, part, num_range)
futures[fut] = (part)
endTime = time.time()
print("PROCESS COMPLETE")
print(f"Elapsed Time: {endTime - startTime}")
return True
if __name__ == '__main__':
main()
If you change the max_concurrency and max_children values it will be less likely to occur, but even when max_concurrency*max_children <= os.cpu_count() I've had this error occur. It just seems more reliable to trigger with a larger number of processes.
There was a similar case when a user was trying to run a Python script via Task Scheduler and getting the Runtime error message. The way they resolved the issue was to do the following:
- Navigate to the Project tab> Package Manager
- Click on the 3 dots>click on clone
- It might take some time,
- Activate the cloned environment from the 3 dots> Activate >reopened ArcGIS Pro and re-ran the entire script, and the script executed successfully without any error
Sadly, this did not work.
When I tried running the code I got 20 processes in (and 200-something VSCode subprocesses) and I was using almost 10 GB of RAM! If you're doing so much number crunching that you need 400-something concurrent processes you need to find a Python library that manages its own thread pool (polars is my go-to, numpy might also work here?). Failing that, you can rewrite your program to import arcpy after every future has resolved. This cuts the size of each runtime process from ~200MB to ~35MB on my machine, which kept my process below 8GB (and it'll probably run faster overall as you're writing all the data in a single cursor).
I already rewrote the nested portion to use shapely instead of arcpy and that improved both memory usage and speed.
I need to output the results of each future as they complete to avoid a scenario where the script is running for multiple days and unexpectedly stops. I'm thinking of just outputting WKB to a file to avoid arcpy during this processing part. I just wanted to see if I was encountering an issue with arcpy, or if my approach was flawed.
I'll look into polars and numpy.
Can't help you with the specific error message that you receive, but a couple of remarks:
- If you are still running Windows 10, you may not be able to go beyond 64 processes due to limitations with Windows Processor Groups:
https://bitsum.com/general/the-64-core-threshold-processor-groups-and-windows/
It is still not entirely clear to me personally, if and how Windows 11 handles this and if it allows true unlimited process numbers. There were changes though to this aspect of Windows, but I am still on Windows 10 so can't verify it.
- Do you really need processes? If you connect to a database, the database may well turn a threaded application into something close to a processes based multi-processing solution, however allowing you to use threads in your Python application.
E.g. in the screenshot below, I am using a concurrent.futures.ThreadPoolExecutor to execute up to 44 threads with SQL statements to generalize data on the database using PostGIS commands. As you can see from the inset of the remote desktop to the server, this pushes the PostgreSQL database to a full 100% CPU usage, without processes.
I'm on windows 11. I initially used TreadPoolExecutor, but I'm not I/O bound and the GIL caused it to be much slower than ProcessPoolExecutor.
@RyJohnsen wrote:I'm on windows 11. I initially used TreadPoolExecutor, but I'm not I/O bound and the GIL caused it to be much slower than ProcessPoolExecutor.
Yes, it all depends on how much actual CPU work versus IO you are doing whether ThreadPoolExecutor or ProcessPoolExecutor is the best solution (and available resources in terms of RAM etc., although that starts to become a pretty mood discussion on modern power desktops, that usually have plenty of resources).
However, in my experience, it is pretty hard to become purely CPU and not IO bound. You really need to do significant work to be CPU bound or limited.
Actually, for Python, the maximum number of workers that can be launched on Windows is 61 according to the documentation for concurrent.futures: https://docs.python.org/3/library/concurrent.futures.html
By the way, if you don't want irritating pop-ups of command windows during execution of processes, you can add the code below, which will change the python executable and prevent pop-ups and execute like if it was a thread (although process):
running it with max_concurrency of 6 and max_children of 10 for a max of 60 processes caused the same issue, so that doesn't seem to be the cause.