Sorting help

743
7
Jump to solution
04-29-2022 06:48 AM
KaleyHansen
New Contributor
 
0 Kudos
1 Solution

Accepted Solutions
by Anonymous User
Not applicable

You need to create a list and append the row into it during your for loop.  Once you get the list, you can sort by using .sort().  list sorting 

 

urlList = []
with open('C:/lidar-2013.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=';')
    for row in readCSV:
        print(row[0],row[1])
        urlList.append(row[0]) #<- which ever value you want to append

print(urlList)
print(urlList.sort())

 

 Once you get that figured out, you can look at concatenation and using the request python package to download the file.

View solution in original post

0 Kudos
7 Replies
by Anonymous User
Not applicable

What do you have so far? You can use beautifulsoup to extract the urls from the table, parse the file name into a list (or dictionary with the file name as key, url as value). Then iterate over the sorted structure and download using request.

You should be aware that sites have a robots.txt file that lists what directories can be crawled/scraped and which ones they don't want you to.  You can view it by appending robots.txt to the end of the url like: https://opendata.vancouver.ca/robots.txt

Disallow: /explore/download
Disallow: /explore/dataset/*/download 

 The table you are scraping is in /explore/dataset/* path, so I would be cautious/ respectful/ aware of what you are doing.

0 Kudos
KaleyHansen
New Contributor

thanks

0 Kudos
by Anonymous User
Not applicable

You need to create a list and append the row into it during your for loop.  Once you get the list, you can sort by using .sort().  list sorting 

 

urlList = []
with open('C:/lidar-2013.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=';')
    for row in readCSV:
        print(row[0],row[1])
        urlList.append(row[0]) #<- which ever value you want to append

print(urlList)
print(urlList.sort())

 

 Once you get that figured out, you can look at concatenation and using the request python package to download the file.

0 Kudos
KaleyHansen
New Contributor

Jeff, i have the following but i am still having trouble downloading the first 10  .zip files 

file_name = []
url_name = []

with open('C:/lidar-2013.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=';')for row in readCSV:


file_name.append(row[0])
url_name.append(row[1])

print(file_name)
print(file_name.sort())

print(url_name)
print(url_name.sort())

 

0 Kudos
by Anonymous User
Not applicable

You're not telling it to download anything yet. Print just writes to the console so you have to use the requests module to get and write the download.  Take a look at this tutorial to get started.  I cant tell what you have in the csv or in the list so its hard to give any specific guidance.  Don't be afraid to google the question either.  'python download files using URL' for example.

0 Kudos
by Anonymous User
Not applicable

Kaley,

A dictionary would probably work better for your situation and data.

import csv
import requests

# I hard coded a few values in for testing and so you can see the dictionary structure.
# urlDict = {'4830E_54570N': 'https://webtransfer.vancouver.ca/opendata/2013GeoTIFF/4830E_54570N.zip',
#            '4860E_54541N': 'https://webtransfer.vancouver.ca/opendata/2013GeoTIFF/4860E_54540N.zip',
#            ... 
#            '4830E_54573N': 'https://webtransfer.vancouver.ca/opendata/2013GeoTIFF/4830E_54570N.zip',
#            '4860E_54544N': 'https://webtransfer.vancouver.ca/opendata/2013GeoTIFF/4860E_54540N.zip'
#            }

urlDict = {}
with open('C:/lidar-2013.csv') as csvfile:
    readCSV = csv.reader(csvfile, delimiter=';')
    for row in readCSV:
        print(row[0], row[1])
        # assuming row[0] is the file name and row[1] is the url
        urlDict[row[0]] = row[1]

# print to check it
print(f'raw dictionary: {urlDict}')

# sort by the keys (file name)
fNameSorted = dict(sorted(urlDict.items()))

# print to check it
print(f'sorted dictionary: {fNameSorted}')

# iterate over the first ten items in the sorted dictionary and download the file.
for k, v in list(fNameSorted.items())[:10]:
    print(f'downloading: {k}')
    r = requests.get(v, allow_redirects=True)
    # get the file name and extension
    fileName = v.split('/')[-1]
    # save it
    open(fr'your path to the output folder\{fileName}', 'wb').write(r.content)
    print(f'downloading: {k} completed!')

 

0 Kudos
DanPatterson
MVP Esteemed Contributor

Removed question is here

Answered Question with Missing Question - Esri Community


... sort of retired...
0 Kudos