How to create script to scrape addresses from websites.

09-05-2018 09:18 AM
New Contributor


I'm currently working on making a webscraping script that takes 3 inputs (url, element on page to scrape and the address), but I am stumped on where my script is going wrong.

from selenium import webdriver
import csv
import time

'''Function to scrape data from webpage
   The function takes 3 inputs: 
   1. The url to the website to be scraped
   2. Type of element on page to scrape
   3. Element name
def ScrapeStore(_link,_searchElementBy,_elementName):
with open('ScrapedAddresses.csv', 'wb') as file: #writes results to file
        writer = csv.writer(file, delimiter=',')
        driver = webdriver.Firefox() #Launches url in firefox
        driver.get(link) #Get link
        time.sleep(2) #Pause
        #Selects element search method specified in the dictionary
        if _searchElementBy == 'c-address':
            stores = driver.find_elements_by_class_name(_elementName)
elif _searchElementBy == 'tag_name':
            stores = driver.find_elements_by_tag_name(_elementName)
elif _searchElementBy == 'xpath':
            stores = driver.find_elements_by_xpath(_elementName)
elif _searchElementBy == 'id':
            stores = driver.find_elements_by_id(_elementName)
for store in stores: #for each element on the page
            s = store.text #extract text
            if s != ''# While not a blank output
                # reformat to output each address on one line
                s = s.encode('ascii', 'ignore')
                s = s.replace('\n',',')
print (s)
                writer.writerow([str(s)]) #write to file

#Function call

I think something may be wrong with my for loop. The goal is collect addresses and write them to a CSV as the output.

Where do I go from here?


Tags (2)
0 Kudos
1 Reply
MVP Honored Contributor

Please format your code so we can see the indentation: /blogs/dan_patterson/2016/08/14/script-formatting 

Is there an error, or what else is going wrong?

edit: you pass 'class_name' as the '_searchElementBy' value, so it never meets the if condition, so there are never any stores. Also, you define '_link', but not 'link' so I assume you get an error there.

0 Kudos