batch merge of shape files

6293
16
Jump to solution
06-15-2012 01:03 AM
MiikaMäkelä1
New Contributor
I have a situation where I have a master folder. Within it, I have subfolders A to n...

Each subfolder can have shapefiles filename_A to filename_n... (but not necessarily, so every folder needs to be checked)

I want to merge all the filename_A's, filename_B's... from each subfolder to new shapefiles.

filename_A_merge, filenameB_merge... and so on.

Any ideas?

Thanks!
0 Kudos
16 Replies
MiikaMäkelä1
New Contributor
the larger set of folders (over 2000) have a sequence that looks like: XXXXXX_xx_mtk.shp

[ATTACH=CONFIG]15275[/ATTACH]
0 Kudos
MarcinGasior
Occasional Contributor III
The folder names are completely independent to the file names. Unfortunately. Every subfolder within the master folder needs to be chchecked for the repeating file names.

Understood.
With this assumptions collecting a shapefiles list to merge have to be changed. I hope I'll provide revrited script tommorow.
0 Kudos
MarcinGasior
Occasional Contributor III
Here's a modified script - this time based on Python dictionary.
Few notes about the script:
- it's independent of subfolders name
- assumes that shapefiles to be merged have exactly the same name in each subfolder they exist
- one shapefile name don't need to exist in each subfolder
- assumes that ther're no other files in master folder except subfolders with shapefiles to merge
- works only for shapefiles in system folders (cannot be directly applied to esri geodatabase feature classes)


def main():     try:         import arcpy, sys, traceback, os, glob         arcpy.env.overwriteOutput = True         masterFolder = r"C:\tmp\Shp"         outputFolder = r"C:\tmp\Shp_merged"          #collect a list of subfolders in master folder         subfolderLst = os.listdir(masterFolder)          #declare a dictionary where a key will be shapefile name         #... and value a list of pathes to shapefile with this name in all subfolders         shpDict = {}          #loop through all subfolders         for subfolder in subfolderLst:             #check current subfolder and make a list of pathes to each .shp file             shpLst = glob.glob(os.path.join(masterFolder,subfolder,'*.shp'))              #add each shapefile path to dictionary             for shpPath in shpLst:                 shpName = os.path.basename(shpPath)                  #if there's no dictioray key of shapefile name create one with empty list as value                 #... and append path to list                 if not shpName in shpDict:                     shpDict[shpName] = []                     shpDict[shpName].append(shpPath)                  #if there's a dictionary key of shapefile name just append path to the list                 else:                     shpDict[shpName].append(shpPath)           #Merge collected shapefiles into new shapefile         #for each dictionary key use its value (pathes list) as an input to Merge tool         for shapefile in shpDict:             outShp = os.path.join(outputFolder, shapefile[:-4]+"_merge.shp")             arcpy.Merge_management(shpDict[shapefile], outShp)             print outShp + " created."      except:         print arcpy.GetMessages()         # Get the traceback object  '"' + wildcard + '"'         tb = sys.exc_info()[2]         tbinfo = traceback.format_tb(tb)[0]          # Concatenate information together concerning the error into a         #   message string         pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value)          # Return python error messages for use with a script tool         arcpy.AddError(pymsg)          # Print Python error messages for use in Python/PythonWin         print pymsg  if __name__ == '__main__':     main()
0 Kudos
MiikaMäkelä1
New Contributor
Ok, we only have one more issue remaining with this - where do I send the crate of beer? 🙂 I tried it out and it works like a charm! Thank you so much!
0 Kudos
MiikaMäkelä1
New Contributor
I know I'm pushing my luck here, but I'm beginning to realize I have a new problem...

The merged files become too big for ArcGIS to handle. Any process (clip, project) I try to do to a huge shape file results in the software crashing. Would it be very complicated to make this script write to a file geodatabase? I expect them it could handle the large files better?
0 Kudos
MarcinGasior
Occasional Contributor III
I was afraid that large dataset can be an issue.

You can try put output to file geodatabase.
Declare geodatabase path instead of outputFolder:
eg. outputGDB = r"C:\tmp\Testing.gdb"

Then adjust final part with merge:
        for shapefile in shpDict:
            outFC = os.path.join(outputGDB, shapefile[:-4]+"_merge")
            arcpy.Merge_management(shpDict[shapefile], outFC)
            print outFC + " created."
0 Kudos
MiikaMäkelä1
New Contributor
Hi Marcin!

I'm getting back to this thread, because I have a small issue but I think it might be a really simple thing to fix. First of all this script for merging numerous shape files to file gdb has been fantastic. It has been a real time saver.

The issue I now have, is that I have datasets, which among shapefiles also have plain dbf files. I would like the script to also merge the dbf files which don't have any geometry. Currently it does not do this. The logic is otherwise exactly the same - files are named identically, are in several folders etc..

Also, a nice addition would be to know how to get a simple UI for the script, so I could use it in models. I can get the script in arctoolbox, but how do I set parameters to it? It would need only two I guess. The master folder location and the name/location for the output file gdb.

Any help on these I would be ever grateful!

Miika
0 Kudos