AnsweredAssumed Answered

SearchCursor and BeautifulSoup

Question asked by jpilbeam Champion on Mar 27, 2020
Latest reply on Apr 2, 2020 by bixb0012

I'd like to have this script loop through a list of URLs and use a web scraper to search certain things from each of these websites. I have my cursor set up as well as Beautiful soup, but I'm wondering how I identify each item in the list? Can I attach an index number to each one somehow? Here's what I have. If I run this it will print the URLs. I've used Beautiful Soup to find things in HTML before, but I'm not sure how to find things from URLs in a list?

from bs4 import BeautifulSoup
import urllib
import urllib.request
import os, arcpy
import time

#the hosted layer with website urls in 'Website' field
fc = r''
#query the webpage and return the html to the variable'soup'
html = urllib.request.urlopen(url)
#parse the downloaded homepage and grab all text
soup = BeautifulSoup(html, 'html.parser')

#use current time to detect change
t = time.ctime()

#Search Cursor
#fc field where URLs are stored
field = ["Website"]
with arcpy.da.SearchCursor(fc, field) as cursor:
     for row in cursor:

##count the number of '<h1>' tags in HTML
n = len(soup.find_all('h2'))
#the text of the 26th '<h2>' tag
atts = soup.find_all('h2')[20].text