Split records into multiple Feature Datasets

JohnDye · ‎05-28-2013

So, now I'm trying to split records into multiple feature datasets, based on the record's unique value. The end goal is to have a seperate feature dataset for each unique record. I'm getting some errors when trying to dynamically create the feature dataset though. I figured the easiest way to accomplish this would be through two seperate loops, one to create the Feature Datasets based on the uniqueVal and another to actually perform the split.

workspace = arcpy.env.workspace = r"C:\Users\jdk588\Documents\New File Geodatabase.gdb"
fc = r"C:\Users\jdk588\Documents\New File Geodatabase.gdb\Selected_CBSAs"
uniqueSet = set([r[0] for r in arcpy.da.SearchCursor (fc, ["ID"])])
for uniqueVal in uniqueSet:
     print "Creating Feature Dataset for CBSA " + uniqueVal + "..."
     arcpy.CreateFeatureDataset_management(workspace, "CBSA_" + str(uniqueVal))
     print "Successfully created CBSA " + str(uniqueVal) + " Feature Dataset."

for uniqueVal in uniqueSet:
     featureDataset = arcpy.ListDatasets("*" + str(uniqueVal), "Feature")
     workspace = arcpy.env.workspace = featureDataset
     print "Splitting CBSA " + str(uniqueVal) + "..."
     arcpy.Select_analysis(fc, "CBSA_" + str(uniqueVal) + "_bdy", "ID = " + str(uniqueVal))
     print "Success."

Creation of the featuredatasets works fine and it creates them all. However in the second loop, I'm trying to load each feature into its own feature dataset using the Select_analysis tool, which means I need to set the workspace to the appropriate feature dataset with each iteration of the loop. I'm not understanding why it can't access the workspace.

Runtime error
Traceback (most recent call last):
File "<string>", line 11, in <module>
File "c:\program files (x86)\arcgis\desktop10.1\arcpy\arcpy\geoprocessing\_base.py", line 529, in set_
self[env] = val
File "c:\program files (x86)\arcgis\desktop10.1\arcpy\arcpy\geoprocessing\_base.py", line 581, in __setitem__
ret_ = setattr(self._gp, item, value)
RuntimeError: Object: Error in accessing environment <workspace>

Anonymous User · ‎05-28-2013

I think it may fix it if you try this:

import arcpy, os
arcpy.env.workspace = workspace = r"C:\Users\jdk588\Documents\New File Geodatabase.gdb"
fc = r"C:\Users\jdk588\Documents\New File Geodatabase.gdb\Selected_CBSAs"
uniqueSet = set([r[0] for r in arcpy.da.SearchCursor (fc, ["ID"])])
for uniqueVal in uniqueSet:
     print "Creating Feature Dataset for CBSA " + uniqueVal + "..."
     arcpy.CreateFeatureDataset_management(workspace, "CBSA_" + str(uniqueVal))
     print "Successfully created CBSA " + str(uniqueVal) + " Feature Dataset."

for uniqueVal in uniqueSet:
     featureDataset = arcpy.ListDatasets("*" + str(uniqueVal), "Feature")[0].encode('utf-8')  # remove unicoding
     arcpy.env.workspace = os.path.join(workspace, featureDataset)
     print "Splitting CBSA " + str(uniqueVal) + "..."
     arcpy.Select_analysis(fc, "CBSA_" + str(uniqueVal) + "_bdy", "ID = " + str(uniqueVal))

The way you had it before would be returning your one feature dataset, but it would be returned inside of a list in the square brackets. Using the list index of [0] should just return the name of that feature dataset by itself. I also used os.path.join() to join the original workspace with the featureDataset variable so that it had the full path.

View solution in original post

Anonymous User · ‎05-28-2013

I think it may fix it if you try this:

import arcpy, os
arcpy.env.workspace = workspace = r"C:\Users\jdk588\Documents\New File Geodatabase.gdb"
fc = r"C:\Users\jdk588\Documents\New File Geodatabase.gdb\Selected_CBSAs"
uniqueSet = set([r[0] for r in arcpy.da.SearchCursor (fc, ["ID"])])
for uniqueVal in uniqueSet:
     print "Creating Feature Dataset for CBSA " + uniqueVal + "..."
     arcpy.CreateFeatureDataset_management(workspace, "CBSA_" + str(uniqueVal))
     print "Successfully created CBSA " + str(uniqueVal) + " Feature Dataset."

for uniqueVal in uniqueSet:
     featureDataset = arcpy.ListDatasets("*" + str(uniqueVal), "Feature")[0].encode('utf-8')  # remove unicoding
     arcpy.env.workspace = os.path.join(workspace, featureDataset)
     print "Splitting CBSA " + str(uniqueVal) + "..."
     arcpy.Select_analysis(fc, "CBSA_" + str(uniqueVal) + "_bdy", "ID = " + str(uniqueVal))

The way you had it before would be returning your one feature dataset, but it would be returned inside of a list in the square brackets. Using the list index of [0] should just return the name of that feature dataset by itself. I also used os.path.join() to join the original workspace with the featureDataset variable so that it had the full path.

JohnDye · ‎05-28-2013

Thanks Caleb!

Just before you replied, I actually figured out that it was easier to just update the workspace directly with the 'uniqueVal' variable than to use the ListFeatureDatasets function. I agree with your thoughts on the ListFeatureDatasets function returning the value in brackets. os.path.join is probably a better way to go.

Once I got the original code running, the first loop to 2 minutes to run and the second loop took 24 minutes to run for a total runtime of 26 minutes.

With that, I combined the functions into a single loop, hoping that would result in some level of improvement since it wouldn't need to loop through the dataset twice.

for uniqueVal in uniqueSet:
     print "Creating Feature Dataset for CBSA " + uniqueVal + "..."
     arcpy.CreateFeatureDataset_management(workspace, "CBSA_" + str(uniqueVal))
     print "Successfully created CBSA " + str(uniqueVal) + " Feature Dataset."
     print "Resetting workspace to CBSA " + str(uniqueVal) + "'s Feature Dataset..."
     arcpy.env.workspace = r"C:\Users\jdk588\Documents\New File Geodatabase.gdb\CBSA_" + str(uniqueVal)
     print "Successfully reset Workspace."
     print "Splitting CBSA " + str(uniqueVal) + " into Feature Dataset..."
     arcpy.Select_analysis(fc, "CBSA_" + str(uniqueVal) + "_bdy", '"ID" = ' + "'" + str(uniqueVal) + "'")
     print "Successfully split CBSA " + str(uniqueVal) + " into Feature Dataset."

Even with the functions combined into a single loop, it still took 26 minutes to run the entire process from FD creation to splitting the uniqueVals into their respective FD. A little longer than I think it should take. I wonder if the os.path.join method you mention would result in any performance improvements. My instinct tells me it could, but I suspect it would be negligible since resetting the workspace only takes a second, if that.

Can you or anyone else spot any areas I could revise to boost performance?

Anonymous User · ‎05-28-2013

Can you or anyone else spot any areas I could revise to boost performance? ]

It is indeed more efficient to do it all in one loop. And yes, I have found that using the Select_analysis tool is usually very slow. In that case I usually will just create a temporary feature layer with a query then use CopyFeatures_management. Seems to be quite a bit faster. I would do a speed test on this to see if it works any quicker without the select tool

for uniqueVal in uniqueSet:
     print "Creating Feature Dataset for CBSA " + uniqueVal + "..."
     arcpy.CreateFeatureDataset_management(workspace, "CBSA_" + str(uniqueVal))
     print "Successfully created CBSA " + str(uniqueVal) + " Feature Dataset."
     print "Resetting workspace to CBSA " + str(uniqueVal) + "'s Feature Dataset..."
     arcpy.env.workspace = r"C:\Users\jdk588\Documents\New File Geodatabase.gdb\CBSA_" + str(uniqueVal)
     print "Successfully reset Workspace."
     print "Splitting CBSA " + str(uniqueVal) + " into Feature Dataset..."
     query = '"ID" = ' + "'%s'" %uniqueVal
     tmp = arcpy.MakeFeatureLayer_management(fc, 'tmp_lyr', query)
     arcpy.CopyFeatures_management(tmp, 'CBSA_%s_bdy' %uniqueVal)
     print "Successfully split CBSA " + str(uniqueVal) + " into Feature Dataset."

I test the speed of a lot of tools using something like this:

from datetime import datetime as d
startTime = d.now()

# Do all the stuff


print '(Elapsed time: %s)' %(str(d.now() - startTime)[:-3])

RhettZufelt · ‎05-28-2013

John,

Try commenting out your print statements once you get it working correctly.

I have some scripts that take about a hour to run, will take 4 or more hours if I include the print statements. Takes a lot more load/time than one would think.

R_