AnsweredAssumed Answered

How to create script to scrape addresses from websites.

Question asked by dmensah2 on Sep 5, 2018
Latest reply on Sep 5, 2018 by dkwiens

Hello!

 

I'm currently working on making a webscraping script that takes 3 inputs (url, element on page to scrape and the address), but I am stumped on where my script is going wrong.

 

from selenium import webdriver
import csv
import time

'''Function to scrape data from webpage
   The function takes 3 inputs:
   1. The url to the website to be scraped
   2. Type of element on page to scrape
   3. Element name
'''
def ScrapeStore(_link,_searchElementBy,_elementName):
with open('ScrapedAddresses.csv', 'wb') as file: #writes results to file
        writer = csv.writer(file, delimiter=',')
        driver = webdriver.Firefox() #Launches url in firefox
        driver.get(link) #Get link
        time.sleep(2) #Pause
        #Selects element search method specified in the dictionary
        if _searchElementBy == 'c-address':
            stores = driver.find_elements_by_class_name(_elementName)
elif _searchElementBy == 'tag_name':
            stores = driver.find_elements_by_tag_name(_elementName)
elif _searchElementBy == 'xpath':
            stores = driver.find_elements_by_xpath(_elementName)
elif _searchElementBy == 'id':
            stores = driver.find_elements_by_id(_elementName)
for store in stores: #for each element on the page
            s = store.text #extract text
            if s != ''# While not a blank output
                # reformat to output each address on one line
                s = s.encode('ascii', 'ignore')
                s = s.replace('\n',',')
print (s)
                writer.writerow([str(s)]) #write to file
                file.flush()
        driver.quit()

#Function call
ScrapeStore('http://dunkindonutslocationsfinder.com/Dunkin-Donuts-Locations.html/state=NY','class_name','address')

 

I think something may be wrong with my for loop. The goal is collect addresses and write them to a CSV as the output.

 

Where do I go from here?

 

Thanks!

Outcomes