go through list of shapefiles and delete all but required fields - arcpy

BrendaJameson · ‎01-28-2015

I have about 1000 shapefiles that I will be merging into one large shapefile. I am not interested in the attribute information, only the location. I was thinking I should try to condense the files before merging by stripping out all the fields except the required ones and them merging them all together, All the files should be polygons but a few non polygons may creep in so a check for file type would probably be nice. (I think I can code this requirement on my own).

Right now all the files are spread out in multiple directories but I have the file names and directory path listed in a txt file. I would want to copy all these files to a single location, delete the unnecessary fields and then merge. I want to do this all at once instead of each file individually. I know how to code this individually but am unsure how to accomplish this with such a large volume of data. Ideas anyone?

ver. ArcGIS 10.2.1 Advanced

XanderBakker · ‎01-28-2015

I probably would not use the Merge tool for this. Especially if you are not interesting in the attributes (only geometry). You could probably do something like this. Please not that I did not run the code, so be careful...

... but if it works it will:

create an empty output featureclass (which should not be written to the folder that is being searched)
searches for featureclasses of type polygon in the folder and all subfolders
use a search cursor to obtain the geometry and insert it in the output featureclass

import arcpy
import os
arcpy.env.overwriteOutput = True

# settings
basefolder = r"C:\Forum"
fc_out = r"C:\Data\myFGDB.gdb\myResultingFeatureclass"
sr = arcpy.SpatialReference(102100) # specify the correct WKID!

# create the empty output fc (outside the basefolder!)
fc_ws, fc_name = os.path.split(fc_out)
arcpy.CreateFeatureclass_management(fc_ws, fc_name, "POLYGON", spatial_reference=sr)

cnt = 0
# start a da insert cursor
with arcpy.da.InsertCursor(fc_out, ("SHAPE@")) as curs_out:

    # create a list of all polygon featureclasses in (sub)folders of the basefolder
    walk = arcpy.da.Walk(basefolder, datatype="FeatureClass", type="Polygon")

    # loop through featureclasses
    for dirpath, dirnames, filenames in walk:
        for filename in filenames:
            # construct the absolute name to input featureclass
            fc_in = os.path.join(dirpath, filename)
            # for shapefiles (but also coverages)
            desc = arcpy.Describe(dirpath)
            if hasattr(desc, "workspaceType"):
                if arcpy.Describe(dirpath).workspaceType == "FileSystem":

                    # start search cursor and insert features into output featureclass
                    print "Processing featureclass: '{0}'".format(filename)
                    with arcpy.da.SearchCursor(fc_in, ("SHAPE@")) as curs_in:
                        for row in curs_in:
                            cnt += 1
                            if cnt % 1000 == 0:
                                print " - Inserting output feature: {0}".format(cnt)
                            curs_out.insertRow((row[0], ))

del curs_out, curs_in, row

A simple enhancement could be to add a field to the output featureclass (text) and write the source shapefile name to the field.

There is no check to see if the coordinate system coincides with the output featureclass.

Also other polygon featureclasses like coverages, if present, will be included in the result.

View solution in original post

JérémiePedoia · ‎01-28-2015

Hi Brenda.

If I were you, I would work with lists.

One list for all your shapefiles (shps = ["path/to/1","Path/to/2",...])

And one list for the fields you do not want to delete (flds2keep = ["field1","field2",...]) Don't forget to add OID and SHAPE field.

Then for each shapefile, copy the shapefile in one directory and buil a list whith the new shapefiles

For each new shapefile, for each field, if it is not in the list, delete it.

When this is finished, merge all the new shapefiles in one.

AmyKlug · ‎01-28-2015

iterate through each directory + shapefile in text file (hopefully paths in the correct format)
if statement for shape type
if not statement for deleting fields except OID and shape
merge all shapefiles into first (or create a new one merge into that)

unless you really want them in 1 directory don't need to copy or you can add a copy in

RhettZufelt · ‎01-28-2015

Since you already have the paths/filenames in a text file, I would read each line and use os.path.join to append them to the fullpath/filename and append to a list. Personally, I'd make a separate script for this just to copy them all to a common location, then would run the delete/merge script, but could be included if this part might be repeated in the future.

Now that they are in a common folder (and copies as you don't want to delete on your only copy), first, set arcpy.env.workspace = "path to new common folder", then use list files to grab all the *.shp files in the new folder. http://resources.arcgis.com/en/help/main/10.2/index.html#//03q300000018000000

(looks like you could use ListFeatureClasses http://resources.arcgis.com/en/help/main/10.2/index.html#//03q300000023000000 to make the list as well. Probably better as you can filter to only shapefiles, poly and/or point in one swoop something like fcList = arcpy.ListFeatureClasses("*.shp",["Poylgon", "Point"]) which will make a list of just point and/or polygon shapefiles in the current workspace) However, in either case, you won't be able to merge different geometry types, so won't be able to merge points will polys, so could run one iteration with ListFeatureClasses just grabbing points, then run again with the polygons. Will have to account for this in the merge output filename so it doesn't clobber one with the other.

then iterate trough the list and send to list fields http://resources.arcgis.com/en/help/main/10.2/index.html#//018v00000012000000

inside this loop, iterate trough the fields, and append to new fieldList if not "Shape" or "FID" (FID since shapefiles, if other input format, might be OBJECTID).

Now, since the field list is complete for the current shapefile, pass it to delete fields (which will take the list directly, no need to iterate) http://resources.arcgis.com/en/help/main/10.2/index.html#//00170000004n000000 to delete them.

set the fieldList = [] - so is empty for the next shapefile loop.

This will remove all but the FID and Shape fields from each shapefile.

You still have the list of files from ListFIles, just pass this list directly to merge http://resources.arcgis.com/en/help/main/10.2/index.html#//001700000055000000 with a new output filename and run it.

R_

XanderBakker · ‎01-28-2015

I probably would not use the Merge tool for this. Especially if you are not interesting in the attributes (only geometry). You could probably do something like this. Please not that I did not run the code, so be careful...

... but if it works it will:

create an empty output featureclass (which should not be written to the folder that is being searched)
searches for featureclasses of type polygon in the folder and all subfolders
use a search cursor to obtain the geometry and insert it in the output featureclass

import arcpy
import os
arcpy.env.overwriteOutput = True

# settings
basefolder = r"C:\Forum"
fc_out = r"C:\Data\myFGDB.gdb\myResultingFeatureclass"
sr = arcpy.SpatialReference(102100) # specify the correct WKID!

# create the empty output fc (outside the basefolder!)
fc_ws, fc_name = os.path.split(fc_out)
arcpy.CreateFeatureclass_management(fc_ws, fc_name, "POLYGON", spatial_reference=sr)

cnt = 0
# start a da insert cursor
with arcpy.da.InsertCursor(fc_out, ("SHAPE@")) as curs_out:

    # create a list of all polygon featureclasses in (sub)folders of the basefolder
    walk = arcpy.da.Walk(basefolder, datatype="FeatureClass", type="Polygon")

    # loop through featureclasses
    for dirpath, dirnames, filenames in walk:
        for filename in filenames:
            # construct the absolute name to input featureclass
            fc_in = os.path.join(dirpath, filename)
            # for shapefiles (but also coverages)
            desc = arcpy.Describe(dirpath)
            if hasattr(desc, "workspaceType"):
                if arcpy.Describe(dirpath).workspaceType == "FileSystem":

                    # start search cursor and insert features into output featureclass
                    print "Processing featureclass: '{0}'".format(filename)
                    with arcpy.da.SearchCursor(fc_in, ("SHAPE@")) as curs_in:
                        for row in curs_in:
                            cnt += 1
                            if cnt % 1000 == 0:
                                print " - Inserting output feature: {0}".format(cnt)
                            curs_out.insertRow((row[0], ))

del curs_out, curs_in, row

A simple enhancement could be to add a field to the output featureclass (text) and write the source shapefile name to the field.

There is no check to see if the coordinate system coincides with the output featureclass.

Also other polygon featureclasses like coverages, if present, will be included in the result.

BrendaJameson · ‎01-30-2015

This worked great once I figured out I have to create the FGDB. LOL.

XanderBakker · ‎01-31-2015

sorry, that could have been clearer....

BrendaJameson · ‎02-02-2015

How would I go about modifying script to put the file name in a separate field? I had a script that did this before merging but I keep getting [Errno 10054] An existing connection was forcibly closed by the remote host when run. I was hoping I could accomplish by modifying your script.

Many thanks,

XanderBakker · ‎02-02-2015

I haven't tested this, but the code would change just a little:

line 8 the name of the output field that contains the source featureclass
line 16 add the field to the output featureclass
line 42, include the path+name of the source in the tuple (output row)

import arcpy
import os
arcpy.env.overwriteOutput = True

# settings
basefolder = r"C:\Forum"
fc_out = r"C:\Data\myFGDB.gdb\myResultingFeatureclass"
fld_source = "SourceFC"
sr = arcpy.SpatialReference(102100) # specify the correct WKID!

# create the empty output fc (outside the basefolder!)
fc_ws, fc_name = os.path.split(fc_out)
arcpy.CreateFeatureclass_management(fc_ws, fc_name, "POLYGON", spatial_reference=sr)

# add the field that will contain the source:
arcpy.AddField_management(fc_out, fld_source, "TEXT", field_length=255)

cnt = 0
# start a da insert cursor
with arcpy.da.InsertCursor(fc_out, ("SHAPE@", fld_source)) as curs_out:

    # create a list of all polygon featureclasses in (sub)folders of the basefolder
    walk = arcpy.da.Walk(basefolder, datatype="FeatureClass", type="Polygon")

    # loop through featureclasses
    for dirpath, dirnames, filenames in walk:
        for filename in filenames:
            # construct the absolute name to input featureclass
            fc_in = os.path.join(dirpath, filename)
            # for shapefiles (but also coverages)
            desc = arcpy.Describe(dirpath)
            if hasattr(desc, "workspaceType"):
                if arcpy.Describe(dirpath).workspaceType == "FileSystem":

                    # start search cursor and insert features into output featureclass
                    print "Processing featureclass: '{0}'".format(filename)
                    with arcpy.da.SearchCursor(fc_in, ("SHAPE@")) as curs_in:
                        for row in curs_in:
                            cnt += 1
                            if cnt % 1000 == 0:
                                print " - Inserting output feature: {0}".format(cnt)
                            curs_out.insertRow((row[0], fc_in, ))

del curs_out, curs_in, row

BrendaJameson · ‎02-02-2015

Well that was just too awesome for words. Thank you a gajillion times.