<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sorting help from URL in Python Questions</title>
    <link>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169323#M64435</link>
    <description>&lt;P&gt;What do you have so far? You can use beautifulsoup to extract the urls from the table, parse the file name into a list (or dictionary with the file name as key, url as value). Then iterate over the sorted structure and download using request.&lt;/P&gt;&lt;P&gt;You should be aware that sites have a robots.txt file that lists what directories can be crawled/scraped and which ones they don't want you to.&amp;nbsp; You can view it by appending robots.txt to the end of the url like: &lt;A href="https://opendata.vancouver.ca/robots.txt" target="_blank"&gt;https://opendata.vancouver.ca/robots.txt&lt;/A&gt;.&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;Disallow: /explore/download
Disallow: /explore/dataset/*/download&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;The table you are scraping is in /explore/dataset/* path, so I would be cautious/ respectful/ aware of what you are doing.&lt;/P&gt;</description>
    <pubDate>Fri, 29 Apr 2022 14:35:09 GMT</pubDate>
    <dc:creator>Anonymous User</dc:creator>
    <dc:date>2022-04-29T14:35:09Z</dc:date>
    <item>
      <title>Sorting help</title>
      <link>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169295#M64434</link>
      <description />
      <pubDate>Fri, 29 Apr 2022 15:36:47 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169295#M64434</guid>
      <dc:creator>KaleyHansen</dc:creator>
      <dc:date>2022-04-29T15:36:47Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting help from URL</title>
      <link>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169323#M64435</link>
      <description>&lt;P&gt;What do you have so far? You can use beautifulsoup to extract the urls from the table, parse the file name into a list (or dictionary with the file name as key, url as value). Then iterate over the sorted structure and download using request.&lt;/P&gt;&lt;P&gt;You should be aware that sites have a robots.txt file that lists what directories can be crawled/scraped and which ones they don't want you to.&amp;nbsp; You can view it by appending robots.txt to the end of the url like: &lt;A href="https://opendata.vancouver.ca/robots.txt" target="_blank"&gt;https://opendata.vancouver.ca/robots.txt&lt;/A&gt;.&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;Disallow: /explore/download
Disallow: /explore/dataset/*/download&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;The table you are scraping is in /explore/dataset/* path, so I would be cautious/ respectful/ aware of what you are doing.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2022 14:35:09 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169323#M64435</guid>
      <dc:creator>Anonymous User</dc:creator>
      <dc:date>2022-04-29T14:35:09Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting help from URL</title>
      <link>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169333#M64436</link>
      <description>&lt;P&gt;thanks&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2022 15:37:18 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169333#M64436</guid>
      <dc:creator>KaleyHansen</dc:creator>
      <dc:date>2022-04-29T15:37:18Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting help from URL</title>
      <link>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169339#M64437</link>
      <description>&lt;P&gt;You need to create a list and append the row into it during your for loop.&amp;nbsp; Once you get the list, you can sort by using &lt;SPAN class=""&gt;.sort(). &amp;nbsp;&lt;A href="https://www.w3schools.com/python/ref_list_sort.asp" target="_self"&gt;list sorting&lt;/A&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;urlList = []
with open('C:/lidar-2013.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=';')
    for row in readCSV:
        print(row[0],row[1])
        urlList.append(row[0]) #&amp;lt;- which ever value you want to append

print(urlList)
print(urlList.sort())&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;Once you get that figured out, you can look at concatenation and using the request python package to download the file.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2022 15:02:06 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169339#M64437</guid>
      <dc:creator>Anonymous User</dc:creator>
      <dc:date>2022-04-29T15:02:06Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting help from URL</title>
      <link>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169386#M64439</link>
      <description>&lt;P&gt;Jeff, i have the following but i am still having trouble downloading the first 10&amp;nbsp; .zip files&amp;nbsp;&lt;/P&gt;&lt;P&gt;file_name = []&lt;BR /&gt;url_name = []&lt;/P&gt;&lt;P&gt;with open('C:/lidar-2013.csv') as csvfile:&lt;BR /&gt;readCSV = csv.reader(csvfile, delimiter=';')for row in readCSV:&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;file_name.append(row[0])&lt;BR /&gt;url_name.append(row[1])&lt;/P&gt;&lt;P&gt;print(file_name)&lt;BR /&gt;print(file_name.sort())&lt;/P&gt;&lt;P&gt;print(url_name)&lt;BR /&gt;print(url_name.sort())&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2022 16:16:26 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169386#M64439</guid>
      <dc:creator>KaleyHansen</dc:creator>
      <dc:date>2022-04-29T16:16:26Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting help</title>
      <link>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169391#M64440</link>
      <description>&lt;P&gt;You're not telling it to download anything yet. Print just writes to the console so you have to use the requests module to get and write the download.&amp;nbsp; Take a look at this &lt;A href="https://www.tutorialspoint.com/downloading-files-from-web-using-python" target="_self"&gt;tutorial&lt;/A&gt; to get started.&amp;nbsp; I cant tell what you have in the csv or in the list so its hard to give any specific guidance.&amp;nbsp; Don't be afraid to google the question either.&amp;nbsp; 'python download files using URL' for example.&lt;/P&gt;</description>
      <pubDate>Fri, 29 Apr 2022 16:31:03 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169391#M64440</guid>
      <dc:creator>Anonymous User</dc:creator>
      <dc:date>2022-04-29T16:31:03Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting help</title>
      <link>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169564#M64448</link>
      <description>&lt;P&gt;Kaley,&lt;/P&gt;&lt;P&gt;A dictionary would probably work better for your situation and data.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import csv
import requests

# I hard coded a few values in for testing and so you can see the dictionary structure.
# urlDict = {'4830E_54570N': 'https://webtransfer.vancouver.ca/opendata/2013GeoTIFF/4830E_54570N.zip',
#            '4860E_54541N': 'https://webtransfer.vancouver.ca/opendata/2013GeoTIFF/4860E_54540N.zip',
#            ... 
#            '4830E_54573N': 'https://webtransfer.vancouver.ca/opendata/2013GeoTIFF/4830E_54570N.zip',
#            '4860E_54544N': 'https://webtransfer.vancouver.ca/opendata/2013GeoTIFF/4860E_54540N.zip'
#            }

urlDict = {}
with open('C:/lidar-2013.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=';')
    for row in readCSV:
        print(row[0], row[1])
        # assuming row[0] is the file name and row[1] is the url
        urlDict[row[0]] = row[1]

# print to check it
print(f'raw dictionary: {urlDict}')

# sort by the keys (file name)
fNameSorted = dict(sorted(urlDict.items()))

# print to check it
print(f'sorted dictionary: {fNameSorted}')

# iterate over the first ten items in the sorted dictionary and download the file.
for k, v in list(fNameSorted.items())[:10]:
    print(f'downloading: {k}')
    r = requests.get(v, allow_redirects=True)
    # get the file name and extension
    fileName = v.split('/')[-1]
    # save it
    open(fr'your path to the output folder\{fileName}', 'wb').write(r.content)
    print(f'downloading: {k} completed!')&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 30 Apr 2022 14:05:02 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169564#M64448</guid>
      <dc:creator>Anonymous User</dc:creator>
      <dc:date>2022-04-30T14:05:02Z</dc:date>
    </item>
    <item>
      <title>Re: Sorting help</title>
      <link>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169631#M64457</link>
      <description>&lt;P&gt;Removed question is here&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.esri.com/t5/community-mvps-questions/answered-question-with-missing-question/m-p/1169617#M1501" target="_blank"&gt;Answered Question with Missing Question - Esri Community&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 02 May 2022 00:39:09 GMT</pubDate>
      <guid>https://community.esri.com/t5/python-questions/sorting-help/m-p/1169631#M64457</guid>
      <dc:creator>DanPatterson</dc:creator>
      <dc:date>2022-05-02T00:39:09Z</dc:date>
    </item>
  </channel>
</rss>

