batch merge of shape files

4724
16
Jump to solution
06-15-2012 01:03 AM
MiikaMäkelä1
New Contributor
I have a situation where I have a master folder. Within it, I have subfolders A to n...

Each subfolder can have shapefiles filename_A to filename_n... (but not necessarily, so every folder needs to be checked)

I want to merge all the filename_A's, filename_B's... from each subfolder to new shapefiles.

filename_A_merge, filenameB_merge... and so on.

Any ideas?

Thanks!
0 Kudos
1 Solution

Accepted Solutions
MarcinGasior
Regular Contributor
Here's a modified script - this time based on Python dictionary.
Few notes about the script:
- it's independent of subfolders name
- assumes that shapefiles to be merged have exactly the same name in each subfolder they exist
- one shapefile name don't need to exist in each subfolder
- assumes that ther're no other files in master folder except subfolders with shapefiles to merge
- works only for shapefiles in system folders (cannot be directly applied to esri geodatabase feature classes)


def main():     try:         import arcpy, sys, traceback, os, glob         arcpy.env.overwriteOutput = True         masterFolder = r"C:\tmp\Shp"         outputFolder = r"C:\tmp\Shp_merged"          #collect a list of subfolders in master folder         subfolderLst = os.listdir(masterFolder)          #declare a dictionary where a key will be shapefile name         #... and value a list of pathes to shapefile with this name in all subfolders         shpDict = {}          #loop through all subfolders         for subfolder in subfolderLst:             #check current subfolder and make a list of pathes to each .shp file             shpLst = glob.glob(os.path.join(masterFolder,subfolder,'*.shp'))              #add each shapefile path to dictionary             for shpPath in shpLst:                 shpName = os.path.basename(shpPath)                  #if there's no dictioray key of shapefile name create one with empty list as value                 #... and append path to list                 if not shpName in shpDict:                     shpDict[shpName] = []                     shpDict[shpName].append(shpPath)                  #if there's a dictionary key of shapefile name just append path to the list                 else:                     shpDict[shpName].append(shpPath)           #Merge collected shapefiles into new shapefile         #for each dictionary key use its value (pathes list) as an input to Merge tool         for shapefile in shpDict:             outShp = os.path.join(outputFolder, shapefile[:-4]+"_merge.shp")             arcpy.Merge_management(shpDict[shapefile], outShp)             print outShp + " created."      except:         print arcpy.GetMessages()         # Get the traceback object  '"' + wildcard + '"'         tb = sys.exc_info()[2]         tbinfo = traceback.format_tb(tb)[0]          # Concatenate information together concerning the error into a         #   message string         pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value)          # Return python error messages for use with a script tool         arcpy.AddError(pymsg)          # Print Python error messages for use in Python/PythonWin         print pymsg  if __name__ == '__main__':     main()

View solution in original post

0 Kudos
16 Replies
MiikaMäkelä1
New Contributor
...just tried this manually for one file type. went through all the folders searching and adding to the merge tool. Took a nice half an hour - now to wait (for quite some time) to see the merge and hope I did not skip any tiles by accident.
0 Kudos
MarcinGasior
Regular Contributor
This time I prepared a Python script:)

...
Assumption is that shapefile suffix 'n' is the same as subfolder suffix 'n' (suffixes are the same for shapefiles and subfolders)

Here is my testing directory:
[ATTACH=CONFIG]15236[/ATTACH]
And this is the script:
def main():
    try:
        import arcpy, sys, traceback, os
        arcpy.env.overwriteOutput = True
        masterFolder = r"C:\tmp\Shp"
        outputFolder = r"C:\tmp\Shp_merged"

        #collect list of subfolders
        subfolderLst = os.listdir(masterFolder)

        #collect all subfolder suffixes into list
        suffixLst = []
        for subfolder in subfolderLst:
            suffixLst.append(subfolder[-1])

        #for each suffix walk through all subfolders and collect shp names to merge
        for sfx in suffixLst:
            shpToMergeLst = [] #emtpy list to collect shps to merge
            for subfolder in subfolderLst:
                arcpy.env.workspace = os.path.join(masterFolder, subfolder) #build path to workspace

                wildcard = "*" + sfx +".shp" #build wildcard to restrict search in next step
                shapefilesLst = arcpy.ListFeatureClasses(wildcard) #one element list with shp name

                #add full shp path to list (if shp exists in this subfolder)
                if shapefilesLst:
                    shpToMergeLst.append(os.path.join(masterFolder,subfolder,shapefilesLst[0]))

            #merge collected shps into new folder
            if shpToMergeLst:
                outShp = os.path.join(outputFolder, os.path.basename(shpToMergeLst[0][:-4])+"_merge.shp")
                arcpy.Merge_management(shpToMergeLst, outShp)
                print outShp + " created."

    except:
        print arcpy.GetMessages()
        # Get the traceback object  '"' + wildcard + '"'
        tb = sys.exc_info()[2]
        tbinfo = traceback.format_tb(tb)[0]

        # Concatenate information together concerning the error into a
        #   message string
        pymsg = tbinfo + "\n" + str(sys.exc_type)+ ": " + str(sys.exc_value)

        # Return python error messages for use with a script tool
        arcpy.AddError(pymsg)

        # Print Python error messages for use in Python/PythonWin
        print pymsg

if __name__ == '__main__':
    main()


Copy/paste this script in new script window in PyhonWin or in IDLE (which is installed by default) and change pathes in the beginning.
The script contains extended error catching part so let me know if you experience any problems.
0 Kudos
MiikaMäkelä1
New Contributor
Firstly I assume that 'filename' part of shapefile name is different in each subfolder (BTW, Merge tool wont allow for not unique names).


Hi thanks again Marcin! The files actually DO have same names. I tried a manual merge with Arcmap 10 and the merge tool was fine with it. Successfully merged over 50 files with the same name. So that's actually the key to look for - that the name is the same, then it sohuld be merged.
0 Kudos
MiikaMäkelä1
New Contributor
oh and the filename and folder names can not be used for matching. the naming sequences are independent.
0 Kudos
MarcinGasior
Regular Contributor
On the early stage of script development Merge tool reported me some errors due to incorect names. I thought it was the problem.

However, now script works fine with shapefiles with the same name:
[ATTACH=CONFIG]15237[/ATTACH]
0 Kudos
MiikaMäkelä1
New Contributor
Hi Marcin,

I'm in the middle of executing your script! It looks fantastic! The files I have are very many and they come from over 100 uniquely named folders (this is my "test run" because the real challenge I have is the the database I have that is sorted in to over 2000 folders according to map sheets).

Anyway, as this is processing I notice, that file size keeps going up, then reduces to zero again and starts climbing...

Does the process run merge multiple times like: 1+2 --> 1+2+3 --> 1+2+3+4  --> and so on? This would explain why it is so slow and the file size keeps going up and down. can it be programmed so it goes: 1+2 > 12+3 > 123+4 > 1234+5 > and so on? I'm sorry I'm completely code illiterate to check, but this is what I suspect...
0 Kudos
MarcinGasior
Regular Contributor
Miika,

My code was adapded to the problem defined in the first post.
Specifically the subfolder is distinguished by one last character (A, B, ..., n).

The script collects all those last characters, and then for each character (A, B, ..., n) looks for shapefiles ended with this character. When such shapefile is found, it's added to a list to merge. When all subfolders are checked this list is used as an input to Merge tool (which runs once for shapefiles with specific suffix).

I suppose you may experience recreating output file because your subfolders can have the same last character.
If your subfolder are distinguished by more than one character (eg. 2, 3, 4, ... characters), you can just change value in line:
suffixLst.append(subfolder[-1])


Moreover, this script won't be working for geodatabase feature classes because it contains file extensions handling specific to shapefile.
If you can, please provide screenshot of your subfolders so I could ajust this script.
0 Kudos
MiikaMäkelä1
New Contributor
Assumption is that shapefile suffix 'n' is the same as subfolder suffix 'n' (suffixes are the same for shapefiles and subfolders)

Oh, and I just noticed your assumption - this doesn't really apply. The folder names are completely independent to the file names. Unfortunately. Every subfolder within the master folder needs to be chchecked for the repeating file names.

so it should be something like:

look in folder A > select first shape file > look in folder B > is there a file with same name? no/yes > if yes, merge to first file, if no, continue to folder C...
begin from folder A, skip first shape file, select second shape file...

it does not look easy at all...
0 Kudos
MiikaMäkelä1
New Contributor
sorry, I was not very clear in the problem definition. Here's a snapshot what the folders look like. They are unique, with what looks like 3 different sequences running in the folder names. The length is always the same, and the end bit appears identical in all folders: XXXX_xx_tos.shp

[ATTACH=CONFIG]15274[/ATTACH]
0 Kudos