Issue calling a subprocess from ArcGIS Pro Toolbox

JonasNeubürger · ‎08-17-2022

Hi all,

I am developing a toolbox that - at some point - has to process a large amount of data and apply a function to each row. For efficiency reasons I already use a pandas dataframe.

I am familiar with pandas and big dataframes, so thats why I tried using pandas.series.apply to efficiently apply the function to each row.

But processing this big dataframe with 100k rows takes like 100s benchmarking the speed and I don't really want to be stuck with a calculation of 5kk rows taking 50 times longer, when it can be sped up by almost the factor of my cpu cores. I have also experienced the time it takes being even longer when using this from inside ArcGIS Pro, where I had like 25% completion after 1 hour of computing time.

So I looked into the option of using pooling and splitting up my dataframe to utilize my full cpu power, and then apply the function to every chunk in their own pool. I ran into some issues with ArcGIS Pro opening new instances of itself when naively trying to use concurrent.futures, but fixed this by using a subprocess like the large-network-analysis-tool does.

But that lead to my current problem:

File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Lib\subprocess.py", line 951, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\Lib\subprocess.py", line 1420, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
PermissionError: [WinError 5] Access Denied

I have already tried granting myself permissions as described here: Link

Running the args I put into the subprocess call from the command line works perfectly fine, but the subprocess call seems to have some permission issues, even when starting ArcGIS Pro as admin.

I'd appreciate any kind of hints or help with this issue, I am fine with sharing some codesnippets and rewriting lots of code if there is a better way of handling this amount of data.

EDIT: Here is a snippet from the subprocess call (I think those double quotes need to be there, because of "Program Files" in the directory path)

create_no_window = 0x08000000
cwd = os.path.dirname(os.path.abspath(__file__))
python_path = os.path.join(sys.exec_prefix, "python.exe")
script_path = os.path.join(cwd, "solveTable.py")
inputs = [
  '"{}"'.format(python_path),
  '"{}"'.format(script_path),
  "--in_df", '"{}"'.format(in_df_csv),
  # some more kwargs
  ]
with subprocess.Popen(
  inputs,
  stdout=subprocess.PIPE, stderr=subprocess.PIPE,
  creationflags=create_no_window) as process:
  # some code for logging and error handling

The solveTable.py is parsing the kwargs from the input, splits up the in_df into n chunks, then calculates the values for those chunks, saves each result into a .csv file for me to read when everything is done. This - as previously stated - works perfectly when executing from commandline.

ShaunWalbridge · ‎09-09-2022

My first guess is some issue with the pathing or quoting in the `inputs` list, subprocess can be persnickety about this particularly on Windows. Perhaps try a simpler `inputs` to start to isolate what's causing the issue, or disable the no_window and see if there is are any details the process call shows. Just try launching the python_path and a script that returns a value. I would also check that the path being sent as `in_df_csv` is properly normalized.