Python Implementation Question

1559
13
Jump to solution
12-23-2019 02:46 PM
SamLee1
New Contributor

Hello I am trying to figure out the if there is a way to compare two excel file's dates/time?

I wrote a python script, attached below, that downloads a excel file from a website and when I run it again

it pushed the file that was downloaded first to an archived folder and the new file that gets downloads foes to where the old was was originally at and it becomes am excel to csv file in the end.

Excel File Link: https://www.gosolarcalifornia.ca.gov/equipment/documents/Grid_Support_Inverter_List_Full_Data.xlsm 

Website Link: Inverters 

I got this idea so far:import pandas as pd
#Reading two Excel Sheets

sheet1 = pd.read_excel(r'Book1.xlsx')
sheet2 = pd.read_excel(r'Book2.xlsx')

# Iterating the Columns Names of both Sheets
for i,j in zip(sheet1,sheet2):

# Creating empty lists to append the columns values
a,b =[],[]

# Iterating the columns values
for m, n in zip(sheet1,sheet2😞

# Appending values in lists
a.append(m)
b.append(n)

# Sorting the lists
a.sort()
b.sort()

# Iterating the list's values and comparing them
for m, n in zip(range(len(a)), range(len(b))):
if a != b:
print('Column name : \'{}\' and Row Number : {}'.format(i,m))

Is there a way I can do this logic:
if(file1 != file2):
   run script

else

   do nothing

It's the conditional statement that I am stuck on.

*file 1 is the excel on the website and file 2 is the download excel file on the computer

Tags (2)
0 Kudos
13 Replies
BlakeTerhune
MVP Regular Contributor

Line 4 will return a -1 if fileNameHTML is not found in indexHTML. So, line 5 checks for the presence of a <td> tag with the filename. In the event it is found (the value is not -1), it then checks for the first ">" and the first "<" in indexHTML on lines 6 and 7. Using the returned indexes, it slices indexHTML, strips the leading and trailing white space, and formats the text into a datetime object.

Yes, that's exactly right. That was the simplest way I could think (with builtin libraries) to get the text value of the tag in the HTML table. This does assume the filenames are all unique, which I think is safe. Once it finds the file name, it assumes the next tag will contain the date. I simplified this to just look for the first > (indiciating the last character of the opening tag) and the next < after that (indicating the first character of the closing tag). The value of the tag (the date) will be between those two characters. Hopefully that is clear and you can comment that section in a way that makes sense to you.

0 Kudos
BlakeTerhune
MVP Regular Contributor

Line 13 is a little dense so here it is separated to be a little more clear.

import datetime
import os
import requests
import time

indexHTML = requests.get("https://www.gosolarcalifornia.ca.gov/equipment/documents/").text
fileName = "Grid_Support_Inverter_List_Full_Data.xlsm"
fileNameHTML = '<td><a href="{0}">{0}</a></td>'.format(fileName)
fileNameHTMLIndex = indexHTML.find(fileNameHTML)
if fileNameHTMLIndex != -1:
    dateStartIndex = indexHTML.find(">", fileNameHTMLIndex+len(fileNameHTML)) + 1
    dateEndIndex = indexHTML.find("<", dateStartIndex)
    webFileDate = indexHTML[dateStartIndex:dateEndIndex].strip()
    webFileDate = datetime.datetime.strptime(webFileDate, "%Y-%m-%d %H:%M")
else:
    raise Exception("Unable to find fileName {}".format(fileName))

localFilePath = r"your\local\path\here"
localFileDate = os.path.getctime(localFilePath)
localFileDate = datetime.datetime.strptime(time.ctime(localFileDate), "%a %b %d %H:%M:%S %Y")
if webFileDate > localFileDate:
    print("File on the web is newer!")
    # download web file to replace local file
0 Kudos
Arne_Gelfert
Occasional Contributor III

Sorry... I guess I didn't read your post correctly. I think with remote files you may be back to comparing dates if unable to download it. You can use things like urllib for that.

0 Kudos
SamLee1
New Contributor
import urllib.request
file1 = r"C:\temp\Hello.xslm
if(filecmp.cmp(request_url, file1):
   run python script
else:
   do nothing
Will this work?
0 Kudos