I have a very large .gdb (15000+ shapefiles). I know that's unusually and impractially large, but it's the result of an iterative model. I need an Excel sheet with all of those shapefiles' attributes. I tried merging them but it's too many. Is there some other way?
Solved! Go to Solution.
Hi Michael,
More than 15000 feature classes in a geodatabase is indeed unusual. We could debate about how to avoid that situation in the first place, but that's not the point here. So let's focus on your question.
Let's assume that all your feature classes have the same attributes and that they are stored like this:
c:/my.gdb/fc1
c:/my.gdb/fc2
...
c:/my.gdb/fc15000
...
I would use the following code to merge attributes of all your feature classes into a csv file and then I'd try to open the csv file in Excel.
import arcpy
output_file = r'c:\my.csv'
arcpy.env.workspace = r'c:\my.gdb'
feature_classes = arcpy.ListFeatureClasses()
first_fc = feature_classes[0]
fieldnames = [f.name for f in arcpy.ListFields(first_fc) if f.type.lower() != 'geometry']
delimiter = ','
with open(output_file, 'w') as fl:
# write header
fl.write(delimiter.join(fieldnames) + "\n")
# append attributes from each feature class
for fc in feature_classes:
with arcpy.da.SearchCursor(fc, fieldnames) as sc:
for row in sc:
fl.write(delimiter.join(map(str, row)) + '\n')
del row
del sc
If your attributes include some complicated strings or datetime objects, you will be in for a bit of headache with resolving those but the concept is there. Perhaps using tab ('\t') instead of a comma as a delimiter would help with that.
I wonder how long is it all going to take. Especially the ListFeatureClasses function call can take a while with that many feature classes.
Filip.
Try using append in a loop over your files, so that it is only working with 2 files at a time, instead of 15000 all at once.
Best Regards,
Jim
Or as I mentioned in your last post, download the shape file from https://www.census.gov/geo/maps-data/data/cbf/cbf_blkgrp.html : the dbf can be saved as an excel spreadsheet
Thanks for replying - but I don't need that table. I need a new table based on my model's results, which are in a geodatabase. The geodatabase contains one shapefile per BG with new data appended to each.
Hi Michael,
More than 15000 feature classes in a geodatabase is indeed unusual. We could debate about how to avoid that situation in the first place, but that's not the point here. So let's focus on your question.
Let's assume that all your feature classes have the same attributes and that they are stored like this:
c:/my.gdb/fc1
c:/my.gdb/fc2
...
c:/my.gdb/fc15000
...
I would use the following code to merge attributes of all your feature classes into a csv file and then I'd try to open the csv file in Excel.
import arcpy
output_file = r'c:\my.csv'
arcpy.env.workspace = r'c:\my.gdb'
feature_classes = arcpy.ListFeatureClasses()
first_fc = feature_classes[0]
fieldnames = [f.name for f in arcpy.ListFields(first_fc) if f.type.lower() != 'geometry']
delimiter = ','
with open(output_file, 'w') as fl:
# write header
fl.write(delimiter.join(fieldnames) + "\n")
# append attributes from each feature class
for fc in feature_classes:
with arcpy.da.SearchCursor(fc, fieldnames) as sc:
for row in sc:
fl.write(delimiter.join(map(str, row)) + '\n')
del row
del sc
If your attributes include some complicated strings or datetime objects, you will be in for a bit of headache with resolving those but the concept is there. Perhaps using tab ('\t') instead of a comma as a delimiter would help with that.
I wonder how long is it all going to take. Especially the ListFeatureClasses function call can take a while with that many feature classes.
Filip.
This seems like exactly what I need.
Please forgive the novice question, but with what, if anything, am I to replace 'geometry' and "\n"? I have tried running this, but ArcMap just hangs.
what exactly is it hanging on? Include some print statements to see what bits of code are successfully being executed and where it hangs.
Hi Michael,
You're not supposed to replace 'geometry' or '\n' with anything. The line with 'geometry' lists names of all the columns where column type is not geometry. The '\n' adds a new line character at the end of every line you write to the output csv file.
Joseph Head is right. Some print statements will help you figure out what's going on. I recommend adding the following print statements to line 4, 6, 10, and 12, respectively:
print("Listing feature classes...") # line 4
print("Found {0} feature classes".format(len(feature_classes))) # line 6
print(field_names) #line 10
print('Writing to {0}'.format(output_file)) # line 12
#Also insert the following just before line 18 and align it with the 'with' keyword:
print(fc)
My guess is that the arcpy.ListFeatureClasses() takes a long time but without the print statements you won't know.
It is likely that the whole thing is going to take some time. One way to figure out if the process is still doing anything is to check the size of the output csv file every couple of minutes to see if it is still growing.
Filip.
I let this script run overnight, and this morning, here's what I found:
>>> import arcpy
... output_file = r'C:\Users\mlenore\Desktop\GIS_Working_Folder\Residential_model\bgs.csv'
... arcpy.env.workspace = r'C:\Users\mlenore\Desktop\GIS_Working_Folder\Residential_model\BGs_finished.gdb'
... print("Listing feature classes...")
... feature_classes = arcpy.ListFeatureClasses()
... print("Found {0} feature classes".format(len(feature_classes)))
... first_fc = feature_classes[0]
... fieldnames = [f.name for f in arcpy.ListFields(first_fc) if f.type.lower() != 'geometry']
... delimiter = ','
...
... print(field_names)
... with open(output_file, 'w') as fl:
... print('Writing to {0}'.format(output_file))
... # write header
... fl.write(delimiter.join(fieldnames) + "\n")
...
... # append attributes from each feature class
... for fc in feature_classes:
... print(fc)
... with arcpy.da.SearchCursor(fc, fieldnames) as sc:
... for row in sc:
... fl.write(delimiter.join(map(str, row)) + '\n')
... del row
... del sc
...
Listing feature classes...
Found 15727 feature classes
Runtime error
Traceback (most recent call last):
File "<string>", line 11, in <module>
NameError: name 'field_names' is not defined
A NameError means that you haven't defined a variable before calling it. In this case, you defined the list of field names on line 8 as "fieldnames" not "field_names". Also look into formatting your code correctly:
import arcpy
for blah in blah:
print("hi")