Hello I am trying to figure out the if there is a way to compare two excel file's dates/time?
I wrote a python script, attached below, that downloads a excel file from a website and when I run it again
it pushed the file that was downloaded first to an archived folder and the new file that gets downloads foes to where the old was was originally at and it becomes am excel to csv file in the end.
Excel File Link: https://www.gosolarcalifornia.ca.gov/equipment/documents/Grid_Support_Inverter_List_Full_Data.xlsm
Website Link: Inverters
I got this idea so far:import pandas as pd
#Reading two Excel Sheets
sheet1 = pd.read_excel(r'Book1.xlsx')
sheet2 = pd.read_excel(r'Book2.xlsx')
# Iterating the Columns Names of both Sheets
for i,j in zip(sheet1,sheet2):
# Creating empty lists to append the columns values
a,b =[],[]
# Iterating the columns values
for m, n in zip(sheet1,sheet2
# Appending values in lists
a.append(m)
b.append(n)
# Sorting the lists
a.sort()
b.sort()
# Iterating the list's values and comparing them
for m, n in zip(range(len(a)), range(len(b))):
if a
print('Column name : \'{}\' and Row Number : {}'.format(i,m))
Is there a way I can do this logic:
if(file1 != file2):
run script
else
do nothing
It's the conditional statement that I am stuck on.
*file 1 is the excel on the website and file 2 is the download excel file on the computer
Solved! Go to Solution.
Line 4 will return a -1 if fileNameHTML is not found in indexHTML. So, line 5 checks for the presence of a <td> tag with the filename. In the event it is found (the value is not -1), it then checks for the first ">" and the first "<" in indexHTML on lines 6 and 7. Using the returned indexes, it slices indexHTML, strips the leading and trailing white space, and formats the text into a datetime object.
Yes, that's exactly right. That was the simplest way I could think (with builtin libraries) to get the text value of the tag in the HTML table. This does assume the filenames are all unique, which I think is safe. Once it finds the file name, it assumes the next tag will contain the date. I simplified this to just look for the first > (indiciating the last character of the opening tag) and the next < after that (indicating the first character of the closing tag). The value of the tag (the date) will be between those two characters. Hopefully that is clear and you can comment that section in a way that makes sense to you.
Line 13 is a little dense so here it is separated to be a little more clear.
import datetime
import os
import requests
import time
indexHTML = requests.get("https://www.gosolarcalifornia.ca.gov/equipment/documents/").text
fileName = "Grid_Support_Inverter_List_Full_Data.xlsm"
fileNameHTML = '<td><a href="{0}">{0}</a></td>'.format(fileName)
fileNameHTMLIndex = indexHTML.find(fileNameHTML)
if fileNameHTMLIndex != -1:
dateStartIndex = indexHTML.find(">", fileNameHTMLIndex+len(fileNameHTML)) + 1
dateEndIndex = indexHTML.find("<", dateStartIndex)
webFileDate = indexHTML[dateStartIndex:dateEndIndex].strip()
webFileDate = datetime.datetime.strptime(webFileDate, "%Y-%m-%d %H:%M")
else:
raise Exception("Unable to find fileName {}".format(fileName))
localFilePath = r"your\local\path\here"
localFileDate = os.path.getctime(localFilePath)
localFileDate = datetime.datetime.strptime(time.ctime(localFileDate), "%a %b %d %H:%M:%S %Y")
if webFileDate > localFileDate:
print("File on the web is newer!")
# download web file to replace local file
Sorry... I guess I didn't read your post correctly. I think with remote files you may be back to comparing dates if unable to download it. You can use things like urllib for that.
import
urllib.request
request_url
=
urllib.request.urlopen(
'https://www.gosolarcalifornia.ca.gov/equipment/documents/Grid_Support_Inverter_List_Full_Data.xlsm '
)
file1 = r"C:\temp\Hello.xslm
if(filecmp.cmp(request_url, file1):
run python script
else:
do nothing
Will this work?