Multiprocess call to Intersect_analysis can't create output to gdb

SzymonPiskula · ‎10-28-2011

Hi

I have a script that calculates some intersections between two featureclasses. Since there is a lot of data to process I am trying to have it done in parallelbatches. The work is split into several concurrently running .py script processes . They all take the same input feature classes but operate on different features. Because i need to store and persist the output of each such intersection, each result is saved in my .gdb database in a separate result featureclass. To maintain unique result names, they all have the same prefix with appended unique batch id. So when i execute 2 batches with id 1 and 2 it results in featureclasses named Result_1 and Result_2 .

The problem is that when the output of those simultaneously running processes is set to the same GDB database, for example "C:\data\workspace.gdb", only one process can create the desired output from arcpy.Intersect_analysis to "C:\data\workspace.gdb\Result_1" (that would be for case when process for batch with id =1 succeeds). All other processes fail with the message like:

 ExecuteError('ERROR 000210: Cannot create output C:\data\workspace.gdb\Result_2\nFailed to execute (Intersect).\n',)

Interestingly the problem does not occur if I change my output workspace type from GDB to MDB. Then it all goes well and each process can create it's proper output as I would expect within the same workspace without problems.

Are there any steps that I should take to resolve this issue? I am guessing that once a process gets a handle to GDB there mus exist some sort of index of existing featureclasses, and since it has exclusive access to it the other processes can't modify it. Therefore only one process at time can operate on the database. But that is only my guess.

Can anyone advice on that or confirm/reject my guesses?

Thanks,
Szymon

AndrewChapkowski · ‎10-28-2011

Szymon,
Since I can't see your whole work flow, here is my guess:

Try using arcpy.exists() and if the value it True, use results = arcpy.CreateUniqueName(r"C:\data\workspace.gdb\Result_1") or you can try using the in_memory workspace as well. If you want to overwrite the output, make sure you set arcpy.env.overwriteOuput() on each python process because it will not inherit the environmental parameters from the parent python process.

Example of multiprocessing function:

import arcpy
from arcpy import env
def function(args):
   env.overwriteOutput = True
   env.scratchWorkspace = args[0]
   #  other logic
   #....
   return results

SzymonPiskula · ‎10-28-2011

andrewchap,

Thanks for the answer. As i can understand your suggestion only shows how to obtain a unique name in the workspace, while i do not have a problem with it. I can definitely get it my way.

The question is why my approach does not work for GDB but does work for MDB. It definitely is not caused by non-unique name. Apparently the Intersect_analysis function obtains somehow exclusive access to the GDB and prevents other processes from creating their output to it. The same logic works for fine MDB.

Any hints?

AndrewChapkowski · ‎10-28-2011

What version of software + service packs are you using?

Have you tried writing to a different file geodatabase?

Can you post your full code?

ChrisSnyder · ‎10-28-2011

Two things I have been doing since v9.1 to make Python-based parallel overlay processes run (not sure if I "need" to do them in v9.3.1 and/or v10.0 still, but it doesn't seem to hurt).

1) Write each result to seperate FGDBs (in your case, named results_1, results_2, etc). Another option that might work is to write the output to the in_memory workspace, and have a try/except statement that attempts to copy the FC to a single FGDB, if it fails, wait 5 sec and try again - maybe try up to 10 times or so... Something like:

tryCount = 0
successFlag = False
while tryCount <= 10:
   try:
      gp.CopyFeatures_managment(blah, blah)
      successFlag = True
      break
   except:
      time.sleep(5)
      tryCount = tryCount + 1
if successFlag == False:
   print "Failed!";sys.exit()

2) In the child processes, reset the TEMP and TMP system variables to unique directories. For example:

import time, os, shutil
newTempDir = r"C:\temp\gptmpenvr_" + time.strftime('%Y%m%d%H%M%S')
os.mkdir(newTempDir)
os.environ["TEMP"] = newTempDir
os.environ["TMP"] = newTempDir

This rediverts the overlaytiles.txt file to seperate folders so each process isn't trying to write to the same file.

BruceHarold · ‎10-31-2011

Hello All

The underlying issue Szymon uncovered is likely to be attempted concurrent writing to system tables. The File GDB itself does not have exclusive locks unless you're doing something like renaming or deleting it. Writing to separate geodatabases will avoid the issue.

Regards