AnsweredAssumed Answered

SearchCursor and BeautifulSoup

Question asked by jpilbeam Champion on Mar 27, 2020
Latest reply on Apr 2, 2020 by bixb0012

I'd like to have this script loop through a list of URLs and use a web scraper to search certain things from each of these websites. I have my cursor set up as well as Beautiful soup, but I'm wondering how I identify each item in the list? Can I attach an index number to each one somehow? Here's what I have. If I run this it will print the URLs. I've used Beautiful Soup to find things in HTML before, but I'm not sure how to find things from URLs in a list?

from bs4 import BeautifulSoup
import urllib
import urllib.request
import os, arcpy
import time


#the hosted layer with website urls in 'Website' field
fc = r'https://services.arcgis.com/fGsbyIOAuxHnF97m/arcgis/rest/services/Grab_Go_School_Meals_Location_(View)/FeatureServer/0?token=fbpCfA34sTJ4rzWO2TQn_c38B4TfGWOZ6jTMeFL1m7CNKd9_odI1t_t_hL-YvvePbE3M428FRT-zW-bISRYrGdJ2CnloKrHoHAfMnbGXpJ-5-zZBU6ONK1u0hMv5D-Vy-fnRpqpQP3aiQEke8L9d9jxDVBKWPamqCa0z0ko4IZX3xpIpHPSEKpmwpcJEaK7Z_rai3IBsT5-tqfMKIxnGCwe4SZZED8bDZM9j1T55-LggpjCgpwqWODs4vpj58iMy'
#query the webpage and return the html to the variable'soup'
html = urllib.request.urlopen(url)
#parse the downloaded homepage and grab all text
soup = BeautifulSoup(html, 'html.parser')

#use current time to detect change
t = time.ctime()

#Search Cursor
#fc field where URLs are stored
field = ["Website"]
with arcpy.da.SearchCursor(fc, field) as cursor:
     for row in cursor:
        print(row)

##BeautifulSoup
##count the number of '<h1>' tags in HTML
n = len(soup.find_all('h2'))
print(n)
#the text of the 26th '<h2>' tag
atts = soup.find_all('h2')[20].text
('http://www.manhattan114.org/index.php/download_file/view/2776/1/',)
('http://www2.nlsd122.org/files/district/parentsandstudents/message_from_superintendent/2019-2020/mfts_031720.pdf',)
('https://www.peotoneschools.org/UserFiles/Servers/Server_266769/File/COVID-19%20Email%203.15.20.pdf',)
('https://manteno5.org/news/what_s_new/c_o_v_i_d-19_updates',)
('https://www.joliet86.org/student-grab-and-go-meals-available/',)
('https://www.joliet86.org/student-grab-and-go-meals-available/',)
('https://www.joliet86.org/student-grab-and-go-meals-available/',)
('https://www.joliet86.org/student-grab-and-go-meals-available/',)

Outcomes