How to merge multiple PDF's from Multiple folders into a single PDF using Python

07-07-2017 08:55 AM
Occasional Contributor

We are using Data Driven pages to create multiple pdfs (each with there own mxd) based on a common dataset. Each PDF is being saved into a separate folder with the same name in each folder (ie: folder1\abc.pdf, folder2\abc.pdf, folder1\def.pdf, folder2\def.pdf, etc) and would like combine them into one file based on the file name. We will have between 350 and 400 files in each folder that would be combined.

Hoping that someone might have a python script that would assist with this process they could share.

Tags (2)
0 Kudos
2 Replies
Regular Contributor

Hi George,

Yeah, that should be possible. Here's how I'd break it down:

First, create a list of all folders which may contain PDFs. Here's what that would look like for me:

# import required modules
import os, arcpy

# create a variable pointing to the root directory that contains all your various PDF folders
rootDirectory = r"W:\scratch"

# create a list of all the folders under the root directory
foldersToSearch = [x[0] for x in os.walk(rootDirectory)]‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Then, get a unique list of the PDF file names:

# create a list of the PDF files
pdfs = []
for f in foldersToSearch:
    pdfs += [x for x in os.listdir(f) if x.endswith(".pdf")]

# 'uniquify' the list
pdfs = list(set(pdfs))‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

>>>['def.pdf', 'abc.pdf']

Next I would create a Python dictionary containing the unique PDF names as the keys and a list of the full paths to each matching PDF as the corresponding values.

# create the dictionary with each PDF file name as the keys and an empty list as the value
pdfDict = dict((pdf, []) for pdf in pdfs)

# add the full paths to the pdfs that match the key name to each value list
for f in foldersToSearch:
    for item in os.listdir(f):
        if item.endswith(".pdf"):
            pdfDict[pdfDict[item.split("\\")[0]].append(os.path.join(f, item))‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Now we have this:

>>>{'def.pdf': ['W:\\scratch\\folder2\\def.pdf', 'W:\\scratch\\folder3\\def.pdf', 'W:\\scratch\\folder1\\def.pdf'], 'abc.pdf': ['W:\\scratch\\folder2\\abc.pdf', 'W:\\scratch\\folder3\\abc.pdf', 'W:\\scratch\\folder1\\abc.pdf']}

Now we get fancy. For each key in the dictionary, we'll create an arcpy PDF object, append each of the corresponding PDFs (full paths) to that object, and save it to the root directory:

# list each unique PDF name
for item in pdfDict.keys():
    # create the pdf object (it will save to your root directory in this example, but could be saved anywhere)
    pdfObject = arcpy.mapping.PDFDocumentCreate(os.path.join(rootDirectory, item))
    # for each corresponding PDF path, add the PDF to the PDF Object
    for pdfPath in pdfDict[item]:
    # save and close the finished PDF

You might need to modify this if the order in which the pages are appended to the master PDF matters, but this should get you going. Hope this helps!

Warm Regards,


Occasional Contributor

Thanks for the assistance. I'll keep you posted.

0 Kudos