topic Re: Sorting help from URL in Python Questions

Sorting help

KaleyHansen — Fri, 29 Apr 2022 15:36:47 GMT

Re: Sorting help from URL

Anonymous User — Fri, 29 Apr 2022 14:35:09 GMT

What do you have so far? You can use beautifulsoup to extract the urls from the table, parse the file name into a list (or dictionary with the file name as key, url as value). Then iterate over the sorted structure and download using request.

You should be aware that sites have a robots.txt file that lists what directories can be crawled/scraped and which ones they don't want you to. You can view it by appending robots.txt to the end of the url like: https://opendata.vancouver.ca/robots.txt.

Disallow: /explore/download
Disallow: /explore/dataset/*/download

The table you are scraping is in /explore/dataset/* path, so I would be cautious/ respectful/ aware of what you are doing.

Re: Sorting help from URL

KaleyHansen — Fri, 29 Apr 2022 15:37:18 GMT

thanks

Re: Sorting help from URL

Anonymous User — Fri, 29 Apr 2022 15:02:06 GMT

You need to create a list and append the row into it during your for loop. Once you get the list, you can sort by using .sort(). list sorting

urlList = [] with open('C:/lidar-2013.csv') as csvfile: readCSV = csv.reader(csvfile, delimiter=';') for row in readCSV: print(row[0],row[1]) urlList.append(row[0]) #<- which ever value you want to append print(urlList) print(urlList.sort())

Once you get that figured out, you can look at concatenation and using the request python package to download the file.

Re: Sorting help from URL

KaleyHansen — Fri, 29 Apr 2022 16:16:26 GMT

Jeff, i have the following but i am still having trouble downloading the first 10 .zip files

file_name = []
url_name = []

with open('C:/lidar-2013.csv') as csvfile:
readCSV = csv.reader(csvfile, delimiter=';')for row in readCSV:

file_name.append(row[0])
url_name.append(row[1])

print(file_name)
print(file_name.sort())

print(url_name)
print(url_name.sort())

Re: Sorting help

Anonymous User — Fri, 29 Apr 2022 16:31:03 GMT

You're not telling it to download anything yet. Print just writes to the console so you have to use the requests module to get and write the download. Take a look at this tutorial to get started. I cant tell what you have in the csv or in the list so its hard to give any specific guidance. Don't be afraid to google the question either. 'python download files using URL' for example.

Re: Sorting help

Anonymous User — Sat, 30 Apr 2022 14:05:02 GMT

Kaley,

A dictionary would probably work better for your situation and data.

import csv import requests # I hard coded a few values in for testing and so you can see the dictionary structure. # urlDict = {'4830E_54570N': 'https://webtransfer.vancouver.ca/opendata/2013GeoTIFF/4830E_54570N.zip', # '4860E_54541N': 'https://webtransfer.vancouver.ca/opendata/2013GeoTIFF/4860E_54540N.zip', # ... # '4830E_54573N': 'https://webtransfer.vancouver.ca/opendata/2013GeoTIFF/4830E_54570N.zip', # '4860E_54544N': 'https://webtransfer.vancouver.ca/opendata/2013GeoTIFF/4860E_54540N.zip' # } urlDict = {} with open('C:/lidar-2013.csv') as csvfile: readCSV = csv.reader(csvfile, delimiter=';') for row in readCSV: print(row[0], row[1]) # assuming row[0] is the file name and row[1] is the url urlDict[row[0]] = row[1] # print to check it print(f'raw dictionary: {urlDict}') # sort by the keys (file name) fNameSorted = dict(sorted(urlDict.items())) # print to check it print(f'sorted dictionary: {fNameSorted}') # iterate over the first ten items in the sorted dictionary and download the file. for k, v in list(fNameSorted.items())[:10]: print(f'downloading: {k}') r = requests.get(v, allow_redirects=True) # get the file name and extension fileName = v.split('/')[-1] # save it open(fr'your path to the output folder\{fileName}', 'wb').write(r.content) print(f'downloading: {k} completed!')

Re: Sorting help

DanPatterson — Mon, 02 May 2022 00:39:09 GMT

Removed question is here

Answered Question with Missing Question - Esri Community