Select to view content in your preferred language

Update table so column one stores the value of matching value from column two

159
4
Jump to solution
Thursday
JaredPilbeam2
MVP Alum

Looking for some help. The title of my post isn't the greatest. What I'm trying to do is update an empty field (Pdfs) of my feature class with a full URL that's pulled from HTML of a webpage. I'm matching the ALT values of the webpage with the DXF_TEXT column in the feature class and writing the matching URL into the Pdfs column.

# existing column in the table of my feature class
DXF_TEXT
01-01A-E
01-01A-W
01-01B-E
01-01B-W

 

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import arcpy

base_url = "https://www.willcountysoa.com/section/TaxMaps/wheatland.htm"
resp = requests.get(base_url)
soup = BeautifulSoup(resp.text, "html.parser")
taxmapgrid_feature = r'\path\to\featureclass'

# dictionary of ALT text to the corresponding URL.
alt_to_url = {}
for a in soup.find_all("area"):
    alt = a.get("alt")
    if alt:
        alt_to_url[alt] = urljoin(base_url, a["href"])

# Match the ALT values with the DXF_TEXT column in the feature class and
# write the matching URL into the Pdfs column.
with arcpy.da.UpdateCursor(taxmapgrid_feature, ["DXF_TEXT", "Pdfs"]) as cursor:
    for row in cursor:
        row[1] = alt_to_url.get(row[0])
        print(row)
        cursor.updateRow(row)

This code will run without error and print but no values are put into the table.

Prints:

['01-24B-E', None]
['01-24B-W', None]
['02-24C-E', None]
['02-24C-W', None]

 

0 Kudos
1 Solution

Accepted Solutions
AlfredBaldenweck
MVP Frequent Contributor

Yeah, maybe do your get() with like a row[0]+".pdf" instead of row[0]?

View solution in original post

0 Kudos
4 Replies
AlfredBaldenweck
MVP Frequent Contributor

Can you get a print statement before and after line 15, as well as a print statment for alt_to_url after that loop? I'm not sure that your dictionary is actually getting populating.

0 Kudos
JaredPilbeam2
MVP Alum

Yes, before and after line 15 prints this:

print(alt)
01-01B-E.pdf
01-01B-W.pdf
01-01A-E.pdf
01-01A-W.pdf

 Yes, I get a print statement for alt_to_url after that loop too:

print(alt_to_url)
{'01-01B-E.pdf': 'https://www.willcountysoa.com/section/TaxMaps/25_01-01B-E.pdf', '01-01B-W.pdf': 'https://www.willcountysoa.com/section/TaxMaps/25_01-01B-W.pdf', '01-01A-E.pdf': 'https://www.willcountysoa.com/section/TaxMaps/25_01-01A-E.pdf', '01-01A-W.pdf': 'https://www.willcountysoa.com/section/TaxMaps/25_01-01A-W.pdf'}

 

0 Kudos
JaredPilbeam2
MVP Alum

I wonder if it has something to do with there being the *.pdf extension in the ALT tag and no extension on the values in the DXF_TEXT column?

AlfredBaldenweck
MVP Frequent Contributor

Yeah, maybe do your get() with like a row[0]+".pdf" instead of row[0]?

0 Kudos