Hi all,
I'm working to create a python script in Arcpro notebooks to add the filepath of a PDF document associated with each attribute in a table to a new field in that same table.
The issue is that the PDF documents are in different folders (layout shown below):
MAINFOLDER
A
STREETNAME1
ADDRESS1.pdf
ADDRESS2.pdf
STREETNAME2
STREETNAME3
B
STREETNAME1
ADDRESS1.pdf
ADDRESS2.pdf
STREETNAME2
STREETNAME3
C
...
If all of the PDF documents had the same path, I would just use an updatecursor to add in the path + ADDRESSField. Since the path is different, I'm not sure how to add the correct path to each attribute.
Currently, I've used the glob module to add all of the PDF filepaths to a list, and I have a list of lists that contains the attribute fields for both STREET_NUM and STREETNAME that I'm using to associate each attribute with the correct table.
Code:
import glob
import os
os.chdir('M:\Service Sheets')
servicePath = glob.glob("*/*/*.pdf") # Here is the list of all PDF files in the directory
fc = "Parcel_Subset"
fields = ["STREET_NUM", "STREETNAME", "PDFLink"] # PDFLink is where the output will go
LocList = []
with arcpy.da.SearchCursor(fc, ['STREET_NUM', 'STREETNAME']) as cursor:
for row in cursor:
LocList.append(row) # Creates the List of Lists
# Here would be the updatecursor, if I could get it to work
I'm not sure this is the best method, or if there's an easier way that I just don't know. This would also just add a text field with the path, if possible, it would be better to make the path a clickable link that opened the PDF document.
Any advice or solutions would be appreciated!
Solved! Go to Solution.
I'm not sure how you are tying the street number and streetname to the file names in the folders, but as an idea to your first part of the question, you could use a dictionary to store all of the files and their paths. I can't really tell what attributes ties to what folder/ file, so this is only an idea that you can expand on if it looks doable with your data.
def recursive_walk(parentFolder):
pdfDict = {}
for folderName, subfolders, filenames in os.walk(parentFolder):
if subfolders:
for subfolder in subfolders:
recursive_walk(subfolder)
for filename in filenames:
print(os.path.join(folderName, filename))
pths = folderName.split(os.sep)
pdfDict[pths[-2]+pths[-1]+filename.replace('.pdf', '')] = { 'attrMatch': pths[-1] +filename.replace('.pdf', ''), filename.replace('.pdf', '') : os.path.join(folderName, filename) }
return pdfDict
pdfDict = recursive_walk(r'C:\Users\Documents\PDFs')
provides a dictionary like:
{'AStreetname1address1': {'attrMatch': 'Streetname1address1', 'address1': 'C:\\Users\\Documents\\PDFs\\A\\Streetname1\\address1.pdf'},
'AStreetname1address2': {'attrMatch': 'Streetname1address2', 'address2': 'C:\\Users\\Documents\\PDFs\\A\\Streetname1\\address2.pdf'},
'AStreetname2address1': {'attrMatch': 'Streetname2address1', 'address1': 'C:\\Users\\Documents\\PDFs\\A\\Streetname2\\address1.pdf'},
...
'CStreetname3address1': {'attrMatch': 'Streetname3address1', 'address1': 'C:\\Users\\Documents\\PDFs\\C\\Streetname3\\address1.pdf'},
'CStreetname3address2': {'attrMatch': 'Streetname3address2', 'address2': 'C:\\Users\\Documents\\PDFs\\C\\Streetname3\\address2.pdf'}}
Then in the cursor, get the path by looking up the combo of attributes:
with arcpy.da.UpdateCursor(fc, ['STREET_NUM', 'STREETNAME', 'PDFLink']) as cursor:
for row in cursor:
attrConcat = f"{row[1]}{row[0]}" # create the unique combination to match the key format in the dictionary
for k, v, in pdfDict.items(): # test if the key combo has a dictionary entry
if v['attrMatch'] == attrConcat: # get the values from the matching key
attr, path = v.items()
row[2] = fr'''<a href="{path[1]} target=_top">{attr[1]}</a>''' # set the PDFLink to the path from the dict
cursor.UpdateRow(row)
In regards to the second part of the question, you can format the output to be in HTML, which will be read as a link.
<a href="C:\Users\USER\...\Lorem Ipsum.docx" target=_top">Lorem Ipsum.docx</a>
I'm not sure how you are tying the street number and streetname to the file names in the folders, but as an idea to your first part of the question, you could use a dictionary to store all of the files and their paths. I can't really tell what attributes ties to what folder/ file, so this is only an idea that you can expand on if it looks doable with your data.
def recursive_walk(parentFolder):
pdfDict = {}
for folderName, subfolders, filenames in os.walk(parentFolder):
if subfolders:
for subfolder in subfolders:
recursive_walk(subfolder)
for filename in filenames:
print(os.path.join(folderName, filename))
pths = folderName.split(os.sep)
pdfDict[pths[-2]+pths[-1]+filename.replace('.pdf', '')] = { 'attrMatch': pths[-1] +filename.replace('.pdf', ''), filename.replace('.pdf', '') : os.path.join(folderName, filename) }
return pdfDict
pdfDict = recursive_walk(r'C:\Users\Documents\PDFs')
provides a dictionary like:
{'AStreetname1address1': {'attrMatch': 'Streetname1address1', 'address1': 'C:\\Users\\Documents\\PDFs\\A\\Streetname1\\address1.pdf'},
'AStreetname1address2': {'attrMatch': 'Streetname1address2', 'address2': 'C:\\Users\\Documents\\PDFs\\A\\Streetname1\\address2.pdf'},
'AStreetname2address1': {'attrMatch': 'Streetname2address1', 'address1': 'C:\\Users\\Documents\\PDFs\\A\\Streetname2\\address1.pdf'},
...
'CStreetname3address1': {'attrMatch': 'Streetname3address1', 'address1': 'C:\\Users\\Documents\\PDFs\\C\\Streetname3\\address1.pdf'},
'CStreetname3address2': {'attrMatch': 'Streetname3address2', 'address2': 'C:\\Users\\Documents\\PDFs\\C\\Streetname3\\address2.pdf'}}
Then in the cursor, get the path by looking up the combo of attributes:
with arcpy.da.UpdateCursor(fc, ['STREET_NUM', 'STREETNAME', 'PDFLink']) as cursor:
for row in cursor:
attrConcat = f"{row[1]}{row[0]}" # create the unique combination to match the key format in the dictionary
for k, v, in pdfDict.items(): # test if the key combo has a dictionary entry
if v['attrMatch'] == attrConcat: # get the values from the matching key
attr, path = v.items()
row[2] = fr'''<a href="{path[1]} target=_top">{attr[1]}</a>''' # set the PDFLink to the path from the dict
cursor.UpdateRow(row)
Thank you for the help! I had to make some edits to make this work with my data, but this was pretty much exactly what I was looking for!