I'd like to have this script loop through a list of URLs and use a web scraper to search certain things from each of these websites. I have my cursor set up as well as Beautiful soup, but I'm wondering how I identify each item in the list? Can I attach an index number to each one somehow? Here's what I have. If I run this it will print the URLs. I've used Beautiful Soup to find things in HTML before, but I'm not sure how to find things from URLs in a list?
from bs4 import BeautifulSoup
import urllib
import urllib.request
import os, arcpy
import time
fc = r'https://services.arcgis.com/fGsbyIOAuxHnF97m/arcgis/rest/services/Grab_Go_School_Meals_Location_(View)/FeatureServer/0?token=fbpCfA34sTJ4rzWO2TQn_c38B4TfGWOZ6jTMeFL1m7CNKd9_odI1t_t_hL-YvvePbE3M428FRT-zW-bISRYrGdJ2CnloKrHoHAfMnbGXpJ-5-zZBU6ONK1u0hMv5D-Vy-fnRpqpQP3aiQEke8L9d9jxDVBKWPamqCa0z0ko4IZX3xpIpHPSEKpmwpcJEaK7Z_rai3IBsT5-tqfMKIxnGCwe4SZZED8bDZM9j1T55-LggpjCgpwqWODs4vpj58iMy'
html = urllib.request.urlopen(url)
soup = BeautifulSoup(html, 'html.parser')
t = time.ctime()
field = ["Website"]
with arcpy.da.SearchCursor(fc, field) as cursor:
for row in cursor:
print(row)
n = len(soup.find_all('h2'))
print(n)
atts = soup.find_all('h2')[20].text
('http://www.manhattan114.org/index.php/download_file/view/2776/1/',)
('http://www2.nlsd122.org/files/district/parentsandstudents/message_from_superintendent/2019-2020/mfts_031720.pdf',)
('https://www.peotoneschools.org/UserFiles/Servers/Server_266769/File/COVID-19%20Email%203.15.20.pdf',)
('https://manteno5.org/news/what_s_new/c_o_v_i_d-19_updates',)
('https://www.joliet86.org/student-grab-and-go-meals-available/',)
('https://www.joliet86.org/student-grab-and-go-meals-available/',)
('https://www.joliet86.org/student-grab-and-go-meals-available/',)
('https://www.joliet86.org/student-grab-and-go-meals-available/',)