Looking for some help. The title of my post isn't the greatest. What I'm trying to do is update an empty field (Pdfs) of my feature class with a full URL that's pulled from HTML of a webpage. I'm matching the ALT values of the webpage with the DXF_TEXT column in the feature class and writing the matching URL into the Pdfs column.
# existing column in the table of my feature class
DXF_TEXT
01-01A-E
01-01A-W
01-01B-E
01-01B-W
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import arcpy
base_url = "https://www.willcountysoa.com/section/TaxMaps/wheatland.htm"
resp = requests.get(base_url)
soup = BeautifulSoup(resp.text, "html.parser")
taxmapgrid_feature = r'\path\to\featureclass'
# dictionary of ALT text to the corresponding URL.
alt_to_url = {}
for a in soup.find_all("area"):
alt = a.get("alt")
if alt:
alt_to_url[alt] = urljoin(base_url, a["href"])
# Match the ALT values with the DXF_TEXT column in the feature class and
# write the matching URL into the Pdfs column.
with arcpy.da.UpdateCursor(taxmapgrid_feature, ["DXF_TEXT", "Pdfs"]) as cursor:
for row in cursor:
row[1] = alt_to_url.get(row[0])
print(row)
cursor.updateRow(row)This code will run without error and print but no values are put into the table.
Prints:
['01-24B-E', None]
['01-24B-W', None]
['02-24C-E', None]
['02-24C-W', None]
Solved! Go to Solution.
Yeah, maybe do your get() with like a row[0]+".pdf" instead of row[0]?
Can you get a print statement before and after line 15, as well as a print statment for alt_to_url after that loop? I'm not sure that your dictionary is actually getting populating.
Yes, before and after line 15 prints this:
print(alt)
01-01B-E.pdf
01-01B-W.pdf
01-01A-E.pdf
01-01A-W.pdfYes, I get a print statement for alt_to_url after that loop too:
print(alt_to_url)
{'01-01B-E.pdf': 'https://www.willcountysoa.com/section/TaxMaps/25_01-01B-E.pdf', '01-01B-W.pdf': 'https://www.willcountysoa.com/section/TaxMaps/25_01-01B-W.pdf', '01-01A-E.pdf': 'https://www.willcountysoa.com/section/TaxMaps/25_01-01A-E.pdf', '01-01A-W.pdf': 'https://www.willcountysoa.com/section/TaxMaps/25_01-01A-W.pdf'}
I wonder if it has something to do with there being the *.pdf extension in the ALT tag and no extension on the values in the DXF_TEXT column?
Yeah, maybe do your get() with like a row[0]+".pdf" instead of row[0]?