I'm using beautifulsoup to scrape a list of URLs for Covid-19 info which in turn is used to update our ArcGIS Hub page. Once in a while, a URL is broke and I get a bad handshake error. I want my script to handle this then move on. Currently, there is one URL in the list that is broke, and that one is causing the error(s) here. I tried the following exception block on these errors to no avail.
I also tried these approaches to no avail:
while loop
except requests.exeptions.SSLError as error:
except...
print("error")
try:
item = soup.find(string=re.compile("Grab and Go")) if
soup.find(string=re.compile("Grab and Go")) else "N/A"
print(item)
except (OpenSSL.SSL.Error, ssl.SSLError, urllib3.exceptions.MaxRetryError,
requests.exceptions.SSLError) as error:
print(error)
break
Traceback (most recent call last):
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 456, in wrap_socket
cnx.do_handshake()
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\OpenSSL\SSL.py", line 1915, in do_handshake
self._raise_ssl_error(self._ssl, result)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\OpenSSL\SSL.py", line 1647, in _raise_ssl_error
_raise_current_error()
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\OpenSSL\_util.py", line 54, in exception_from_error_queue
raise exception_type(errors)
OpenSSL.SSL.Error: [('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\urllib3\connectionpool.py", line 600, in urlopen
chunked=chunked)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\urllib3\connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\urllib3\connectionpool.py", line 839, in _validate_conn
conn.connect()
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\urllib3\connection.py", line 344, in connect
ssl_context=context)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\urllib3\util\ssl_.py", line 347, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\urllib3\contrib\pyopenssl.py", line 462, in wrap_socket
raise ssl.SSLError('bad handshake: %r' % e)
ssl.SSLError: ("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\requests\adapters.py", line 449, in send
timeout=timeout
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\urllib3\util\retry.py", line 399, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.cm201u.org', port=443): Max retries exceeded with url: /news_/free_breakfast___lunch_pick-up_days (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",),))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pathtofile", line 33, in <module>
response = requests.get(url, headers=headers)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\requests\sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\requests\adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='www.cm201u.org', port=443): Max retries exceeded with url: /news_/free_breakfast___lunch_pick-up_days (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)",),))
Solved! Go to Solution.
You haven't got your requests.get call inside the try block, you have unrelated code in there, the soup.find stuff.
At the end of the day, you want to pass on any error that requests is generating whether it is DNS, SSL, timeout, etc...; so just catch requests.exceptions.RequestException and only that.
Joshua,
OK, thanks. I tried that. But, I got the same errors in the same order as before.
for url in urls:
#download the homepage
response = requests.get(url, headers=headers)
#parse the downloaded homepage and grab all text
soup = BeautifulSoup(response.text, "lxml")
try:
item = soup.find(string=re.compile("Grab and Go")) if soup.find(string=re.compile("Grab and Go")) else "N/A"
print(item)
except requests.exceptions.RequestException as error:
print('there is a requests error')
You haven't got your requests.get call inside the try block, you have unrelated code in there, the soup.find stuff.
Thank you. That was it!
Something else. The URLs are in a list. Is there an easy way to identify the URL that threw the error? I'm starting a new thread here: https://community.esri.com/message/933511-request-module-try-except-error