MissingSchema: Invalid URL ' ': No schema supplied

JaredPilbeam2 · ‎04-13-2020

Using Python 3.6.9, I'm making requests with Requests by calling URLs from a text file. I put five of them here, for example. The URLs are all valid.

https://secure.shippingapis.com/ShippingAPI.dll?API=Verify&XML=<AddressValidateRequest USERID="564WILLC0589"><Address><Address1>2001 Gardner Cir W</Address1><Address2></Address2><City>Aurora</City><State>IL</State><Zip5></Zip5><Zip4></Zip4></Address></AddressValidateRequest>

https://secure.shippingapis.com/ShippingAPI.dll?API=Verify&XML=<AddressValidateRequest USERID="564WILLC0589"><Address><Address1>2427 Oakfield Dr</Address1><Address2></Address2><City>Aurora</City><State>IL</State><Zip5></Zip5><Zip4></Zip4></Address></AddressValidateRequest>

https://secure.shippingapis.com/ShippingAPI.dll?API=Verify&XML=<AddressValidateRequest USERID="564WILLC0589"><Address><Address1>2451 Avalon Ct</Address1><Address2></Address2><City>Aurora</City><State>IL</State><Zip5></Zip5><Zip4></Zip4></Address></AddressValidateRequest>

https://secure.shippingapis.com/ShippingAPI.dll?API=Verify&XML=<AddressValidateRequest USERID="564WILLC0589"><Address><Address1>2516 Hillsboro Blvd</Address1><Address2></Address2><City>Aurora</City><State>IL</State><Zip5></Zip5><Zip4></Zip4></Address></AddressValidateRequest>

https://secure.shippingapis.com/ShippingAPI.dll?API=Verify&XML=<AddressValidateRequest USERID="564WILLC0589"><Address><Address1>2623 Shenandoah Ct</Address1><Address2></Address2><City>Aurora</City><State>IL</State><Zip5></Zip5><Zip4></Zip4></Address></AddressValidateRequest>‍‍‍‍‍‍‍‍‍

What I intend it to do is iterate the list of URLs and request each one in the for loop. Then it prints out the BeautifulSoup object. It looks like an XML when it's printed which is what I want. While debugging, I see it loops once and prints the first URL as desired, but then throws an error at the top of the for loop. I don't see what's wrong with the URL?

''' (1) putting the URLs one by one in the browser, (2) get the resulting XMLs,
 and (3) listing these XMLs as text in a .txt file '''


import requests
from bs4 import BeautifulSoup


txtfile = r'C:\Users\jpilbeam\USPSAPIWCHDUpdateAll.txt'

#convert text file into a list
with open (txtfile) as f:
    x = (list(map(str.strip ,f.readlines())))

    for i in x:
        #Request the URL
        response = requests.get(i)
        #see if the URL has been correctly encoded print(r.url)
        r_url = response.text

        #parse the downloaded homepage to get a beautifulsoup object
        new_xml = BeautifulSoup(r_url, features = "xml").prettify()
        print(new_xml)‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

Error:

>>> 
[Dbg]>>> 
<?xml version="1.0" encoding="utf-8"?>
<AddressValidateResponse>
 <Address>
  <Address2>
   2001 GARDNER CIR W
  </Address2>
  <City>
   AURORA
  </City>
  <State>
   IL
  </State>
  <Zip5>
   60503
  </Zip5>
  <Zip4>
   6213
  </Zip4>
 </Address>
</AddressValidateResponse>
Traceback (most recent call last):
  File "\\gisfile\GISstaff\Jared\Python Scripts\ArcGISPro\CallToFromUSPS_II.py", line 27, in <module>
    response = requests.get(i)
  File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\requests\api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\requests\api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\requests\sessions.py", line 519, in request
    prep = self.prepare_request(req)
  File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\requests\sessions.py", line 462, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\requests\models.py", line 313, in prepare
    self.prepare_url(url, params)
  File "C:\Program Files\ArcGIS\Pro\bin\Python\envs\arcgispro-py3\lib\site-packages\requests\models.py", line 387, in prepare_url
    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL '': No schema supplied. Perhaps you meant http://?
>>> ‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

JoshuaBixby · ‎04-13-2020

It is saying you have an invalid URL, and empty URL in this case. If the URLs you posted here came from the text file, then you appear to have an extra LF or CR creating a space between them, which would be read as an empty line. Trying just printing the URLs in the loop to make sure they are getting read correctly before passing them to requests.

View solution in original post

JoshuaBixby · ‎04-13-2020

It is saying you have an invalid URL, and empty URL in this case. If the URLs you posted here came from the text file, then you appear to have an extra LF or CR creating a space between them, which would be read as an empty line. Trying just printing the URLs in the loop to make sure they are getting read correctly before passing them to requests.

JaredPilbeam2 · ‎04-13-2020

Yes, that's exactly how they look in the text file. I just looped through the text file without passing to requests and they were read correctly. They look fine. But, good call on the spaces between the URLs. I'll take the spaces out and try again.

JaredPilbeam2 · ‎04-13-2020

That was it! I took out the white spaces in the text file and then ran it again. It now prints all the URLs.

For some reason, though, it's only writing the very last URL to file? The write function is not in the for loop, so why would it not be writing all the URLs?

#convert text file into a list
with open(txtfile) as f:
    x = (list(map(str.strip ,f.readlines())))
    for i in x:
        #Request the URL
        response = requests.get(i)
        #see if the URL has been correctly encoded print(r.url)
        r_url = response.text

        #parse the downloaded homepage to get a beautifulsoup object
        new_xml = BeautifulSoup(r_url, features = "xml").prettify()


#write new list to file in your directory
with open(r'C:\Users\jpilbeam\USPSAPIWCHDUpdateAll_II.txt', "w") as api_list:
    api_list.write(new_xml)
    api_list.close‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍

JoshuaBixby · ‎04-13-2020

You need your write statement within the for loop. the way the code is currently written, by the time the write method called, new_xml only contains the last URL information.

Maybe something like:

f_in = # path to input file
f_out = # path to output file

with open(file_in, "r") as f_in, open(file_out, "w") as f_out:
    for url in f_in:
        #Request the URL
        response = requests.get(url)
        #see if the URL has been correctly encoded print(r.url)
        r_url = response.text

        #parse the downloaded homepage to get a beautifulsoup object
        new_xml = BeautifulSoup(r_url, features = "xml").prettify()
        
        #write new list to file in your directory
        f_out.write(new_xml)
‍‍‍‍‍‍‍‍‍‍‍‍‍‍‍