Python code optimization - licence checkout?

MattRosales · ‎11-22-2010

Hi All,

I have the following script running on roughly 15gb of shapefiles, with the purpose of importing them into a geodatabase. I had previously tried this by finding all of the shapefiles first, and passing them to "Feature class to Feature class" as one block, however it overloaded the tool. This version uses a loop to import them to the geodatabase one-by-one. In this way, the code is stable, and has successfully processed a few thousand shapefiles already, however I am convinced that the process is taking much longer than it should.

For example, the code takes about 35-40 seconds just to check if the feature class already exists in the geodatabase. This is far, far too long, and I suspect it has something to do with checking out the arcInfo license each time it uses an arcpy function. Does anyone have ideas about increasing the speed and efficiency? I was considering trying something like this, but the documentation says it is legacy.

Thanks for the brainstorming,

Matt

(I run this code through a normal python console, not the one in arc for stability reasons)

# Import system modules
import sys, string, os, fnmatch, arcpy
from arcpy import env

### Set user variables here, if no user-input desired: ###
searchPath = os.path.abspath("/path/to/data") #path to search
searchParam = "*.shp" #search parameter - wildcards OK
shapefileType = "Polyline" #shapefile type: Polyline, Polygon, Point, etc.
OutputDatabase = os.path.abspath("/path/to/output.gdb")
### End user variables ###

### Check and print the input parameters ###
if os.path.isdir(searchPath):
   print "Search: " + searchPath + " OK"
else:
    print searchPath + "\n is not a valid path!"
    sys.exit(0)

print "Searching for " + searchParam
print "shapefileType = " + shapefileType

if arcpy.Exists(OutputDatabase): # verify that output database exists. 
   print "OutputDatabase = " + OutputDatabase + "  OK..."
else:
    print searchPath + "\n is not valid"
    sys.exit(0)
### End variable check ###
 
env.workspace = OutputDatabase # set the current workspace as the target database.
env.overwriteOutput = True 

# Search function
resultList = [] #The list for collecting results with

for root, dirs, files in os.walk(searchPath): # crawl through the search directory
 for f in files: # for each file in the search directory
  if fnmatch.fnmatch(f, searchParam): # check if the filename matches
   shpDescr = arcpy.Describe(os.path.join(root, f)) # reads the shapefile
   shpType = shpDescr.shapeType # reads the shapefile type
   if str(shpType) == shapefileType: # check if the shapefile type matches
    outFeatureClass = arcpy.ValidateTableName("cont"+f.replace(".shp",""), OutputDatabase) # generates a database-friendly fieldclass name
    if arcpy.Exists(outFeatureClass): # check to see if already imported
     print os.path.join(root, f) + " already imported as " + outFeatureClass 
    else:
     print "Exporting " + f + " to " + outFeatureClass + "..."
     try:
      arcpy.FeatureClassToFeatureClass_conversion(os.path.join(root, f), OutputDatabase, outFeatureClass) #export to geodatabase
     except: # error handling
      print "Error with " + os.path.join(root, f)
      for msg in range(0, arcpy.GetMessageCount()):
       if arcpy.GetSeverity(msg) == 2:
         arcpy.AddReturnMessage(msg)
      tempString = "'" + os.path.join(root, f) + "'"
      resultList.append(tempString)
      resultList.append(";")
      continue
    

if not resultList: # only runs the next lines if there were problems with some of the files.
 resultList.pop() #removes last list value (trailing semicolon)
 resultString = "\"" + "".join(resultList) + "\"" #Convert list to a single string
 outFilesMV = resultString.replace("\\","\\\\")

 # Outputs a list of problem files in the searchPath directory:
 outputFilename = searchPath + "/ImportErrors.txt"
 fu = open(os.path.abspath(outputFilename), 'w')
 fu.write(outFilesMV)
 fu.close()

MattRosales · ‎11-22-2010

Update - I previously had the code pointing to a locally mapped network drive (IE mapped as 127.0.0.1/some/directory as the letter W: so that I didn't have to type in a long directory name each time. I changed this to the full path directory from C: and had 1000% performance improvement, for whatever reason, making each shapefile import in a matter of seconds. Wow! So, looks like the issue wasn't with the code, but rather with some sort of windows networking performance in stead?

ChrisSnyder · ‎11-22-2010

Definitely use setproduct(). In v9.x I found this to be a pretty critical step...

If you are moving from .shp to FGDB format, you probably won't have any schema issues, and therefore probably don't need the arcpy.Validate thing.

If you aren't changing the schema around, you might consider just using the CopyFeatures tool (instead of FeatureClassToFeatureClass). It should be a little easier/faster.

ChrisSnyder · ‎11-22-2010

Yes, also UNC pathnames (as opposed to mapped drive letters) are ALWAYS the way to go...

MattRosales · ‎11-22-2010

Yes, also UNC pathnames (as opposed to mapped drive letters) are ALWAYS the way to go...

I wonder why that is? I got into the habit of doing this as a number of programs don't recognize relative paths, and so I could switch the drive letter to, for example, an external hard drive or network project depending on what I was working on (collaborating with others) - unfortunate that this has such performance hindrances, but at least I know now. Shapefiles are importing at a rate of 1 per 2-3 seconds now, instead of 1 per 1.5 minutes. (!)

Re: setproduct(), the documentation lists using "import arcinfo" before "import arcpy" as an alternate (legacy) way of doing this. Do you have a preference / notice a difference? Thanks again,

ChrisSnyder · ‎11-22-2010

Not sure why UNC is faster, just something I got into the habit of doing a long time ago. I have a pet peeve against using network drive letters (The S:\ drive?!?! I don't have an S:\ drive, what the hell is that? Oh you mean \\snarf\am\div_lm\ds?, well why didn't you say so!?!). Pet peeve... Anyway...

Setting the license level is also another thing I got into the habit of doing. Like the UNC path thing, I'm not really sure why, just something i started doing, and my scripts always seem to run fine (minus other stuff :)). Here's some v9.3 code that checks out the highest level license (given ArcInfo, ArcEditor, or ArcView):

if gp.CheckProduct("ArcInfo") == "Available":
    gp.SetProduct("ArcInfo")
elif gp.CheckProduct("ArcEditor") == "Available":
    gp.SetProduct("ArcEditor")
elif gp.CheckProduct("ArcView") == "Available":
    gp.SetProduct("ArcView")
else:
    messsage =  "ERROR: Can't specify ArcGIS license level... Exiting script!"; showPyMessage(); sys.exit()
message =  "Selected an " + gp.ProductInfo() + " license"; showPyMessage()