Get attributes from many shapefiles?

2917
24
Jump to solution
03-02-2017 08:40 AM
MichaelLenore1
New Contributor III

I have a very large .gdb (15000+ shapefiles). I know that's unusually and impractially large, but it's the result of an iterative model. I need an Excel sheet with all of those shapefiles' attributes. I tried merging them but it's too many. Is there some other way? 

Tags (2)
1 Solution

Accepted Solutions
FilipKrál
Occasional Contributor III

Hi Michael,

More than 15000 feature classes in a geodatabase is indeed unusual. We could debate about how to avoid that situation in the first place, but that's not the point here. So let's focus on your question.

Let's assume that all your feature classes have the same attributes and that they are stored like this:

c:/my.gdb/fc1
c:/my.gdb/fc2
...
c:/my.gdb/fc15000
...

I would use the following code to merge attributes of all your feature classes into a csv file and then I'd try to open the csv file in Excel.

import arcpy
output_file = r'c:\my.csv'
arcpy.env.workspace = r'c:\my.gdb'

feature_classes = arcpy.ListFeatureClasses()

first_fc = feature_classes[0]
fieldnames = [f.name for f in arcpy.ListFields(first_fc) if f.type.lower() != 'geometry']
delimiter = ','

with open(output_file, 'w') as fl:
    
    # write header
    fl.write(delimiter.join(fieldnames) + "\n")
    
    # append attributes from each feature class
    for fc in feature_classes:
        with arcpy.da.SearchCursor(fc, fieldnames) as sc:
            for row in sc:
                fl.write(delimiter.join(map(str, row)) + '\n')
        del row
        del sc
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

If your attributes include some complicated strings or datetime objects, you will be in for a bit of headache with resolving those but the concept is there. Perhaps using tab ('\t') instead of a comma as a delimiter would help with that.

I wonder how long is it all going to take. Especially the ListFeatureClasses function call can take a while with that many feature classes.

Filip.

View solution in original post

24 Replies
JimCousins
MVP Regular Contributor

Try using append in a loop over your files, so that it is only working with 2 files at a time, instead of 15000 all at once.

Best Regards,

Jim

0 Kudos
JoeBorgione
MVP Emeritus

Or as I mentioned in your last post, download the shape file  from https://www.census.gov/geo/maps-data/data/cbf/cbf_blkgrp.html :  the dbf can be saved as an excel spreadsheet

That should just about do it....
0 Kudos
MichaelLenore1
New Contributor III

Thanks for replying - but I don't need that table. I need a new table based on my model's results, which are in a geodatabase. The geodatabase contains one shapefile per BG with new data appended to each. 

0 Kudos
FilipKrál
Occasional Contributor III

Hi Michael,

More than 15000 feature classes in a geodatabase is indeed unusual. We could debate about how to avoid that situation in the first place, but that's not the point here. So let's focus on your question.

Let's assume that all your feature classes have the same attributes and that they are stored like this:

c:/my.gdb/fc1
c:/my.gdb/fc2
...
c:/my.gdb/fc15000
...

I would use the following code to merge attributes of all your feature classes into a csv file and then I'd try to open the csv file in Excel.

import arcpy
output_file = r'c:\my.csv'
arcpy.env.workspace = r'c:\my.gdb'

feature_classes = arcpy.ListFeatureClasses()

first_fc = feature_classes[0]
fieldnames = [f.name for f in arcpy.ListFields(first_fc) if f.type.lower() != 'geometry']
delimiter = ','

with open(output_file, 'w') as fl:
    
    # write header
    fl.write(delimiter.join(fieldnames) + "\n")
    
    # append attributes from each feature class
    for fc in feature_classes:
        with arcpy.da.SearchCursor(fc, fieldnames) as sc:
            for row in sc:
                fl.write(delimiter.join(map(str, row)) + '\n')
        del row
        del sc
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

If your attributes include some complicated strings or datetime objects, you will be in for a bit of headache with resolving those but the concept is there. Perhaps using tab ('\t') instead of a comma as a delimiter would help with that.

I wonder how long is it all going to take. Especially the ListFeatureClasses function call can take a while with that many feature classes.

Filip.

MichaelLenore1
New Contributor III

This seems like exactly what I need.

Please forgive the novice question, but with what, if anything, am I to replace 'geometry' and "\n"? I have tried running this, but ArcMap just hangs. 

0 Kudos
Lake_Worth_BeachAdmin
Occasional Contributor III

what exactly is it hanging on? Include some print statements to see what bits of code are successfully being executed and where it hangs. 

FilipKrál
Occasional Contributor III

Hi Michael,

You're not supposed to replace 'geometry' or '\n' with anything. The line with 'geometry' lists names of all the columns where column type is not geometry. The '\n' adds a new line character at the end of every line you write to the output csv file.

Joseph Head‌ is right. Some print statements will help you figure out what's going on. I recommend adding the following print statements to line 4, 6, 10, and 12, respectively:

print("Listing feature classes...") # line 4
print("Found {0} feature classes".format(len(feature_classes))) # line 6
print(field_names) #line 10
print('Writing to {0}'.format(output_file)) # line 12

#Also insert the following just before line 18 and align it with the 'with' keyword: 
print(fc)
‍‍‍‍‍‍‍‍‍

My guess is that the arcpy.ListFeatureClasses() takes a long time but without the print statements you won't know.

It is likely that the whole thing is going to take some time. One way to figure out if the process is still doing anything is to check the size of the output csv file every couple of minutes to see if it is still growing.

Filip.

MichaelLenore1
New Contributor III

I let this script run overnight, and this morning, here's what I found: 

>>> import arcpy
... output_file = r'C:\Users\mlenore\Desktop\GIS_Working_Folder\Residential_model\bgs.csv'
... arcpy.env.workspace = r'C:\Users\mlenore\Desktop\GIS_Working_Folder\Residential_model\BGs_finished.gdb'
... print("Listing feature classes...")
... feature_classes = arcpy.ListFeatureClasses()
... print("Found {0} feature classes".format(len(feature_classes)))
... first_fc = feature_classes[0]
... fieldnames = [f.name for f in arcpy.ListFields(first_fc) if f.type.lower() != 'geometry']
... delimiter = ','
...
... print(field_names)
... with open(output_file, 'w') as fl:
... print('Writing to {0}'.format(output_file))
... # write header
... fl.write(delimiter.join(fieldnames) + "\n")
...
... # append attributes from each feature class
... for fc in feature_classes:
... print(fc)
... with arcpy.da.SearchCursor(fc, fieldnames) as sc:
... for row in sc:
... fl.write(delimiter.join(map(str, row)) + '\n')
... del row
... del sc
...
Listing feature classes...
Found 15727 feature classes
Runtime error
Traceback (most recent call last):
File "<string>", line 11, in <module>
NameError: name 'field_names' is not defined

0 Kudos
JonathanQuinn
Esri Notable Contributor

A NameError means that you haven't defined a variable before calling it.  In this case, you defined the list of field names on line 8 as "fieldnames" not "field_names".  Also look into formatting your code correctly:

import arcpy

for blah in blah:
  print("hi")‍‍‍‍