Working with Excel 2013, python

18013
18
Jump to solution
07-18-2013 10:09 AM
PaulHuffman
Occasional Contributor III
I'm interested in trying some data manipulation and retrieval from Excel spreadsheets.  I've been trying out the xlrd, xlwt, and xlutils packages from http://www.python-excel.org/  with some success, but it seems I can't make these packages work with .xlsx files, unless I go to Excel and save them as Excel 97-03 formats, .xls.  Are there updates to these packages for Excel 2013?
Tags (2)
0 Kudos
1 Solution

Accepted Solutions
PhilMorefield
Occasional Contributor III
It's not a quick-and-easy solution to your problem, but if you've installed pywin32 and some version of Excel, you just use the COM interface to Excel.

  • Open PythonWin

  • Click "Tools" and select "COM Makepy utility"

  • Look for something like "Microsoft Excel 12.0 Object Library" and click "OK"


Now you can access Excel directly through a Python script like so:
import win32com.client xl = win32com.client.Dispatch('Excel.Application') xl.Visible = 1 wb = xl.Workbooks.Add()  datasheet = wb.Worksheets.Add() datasheet.Name = 'New name for my worksheet' datasheet.Cells(1, 1).Value = 'This is cell A1' datasheet.Cells(2, 2).Value = 'This is cell B2'  wb.SaveAs('C:\\Temp\\example.xlsx') wb.Close(SaveChanges=1) xl.Quit() del xl

The documentation for accessing Excel this way is scattered, but I've scripted the processing and creating of hundreds of spreadsheets and Excel charts this way.

View solution in original post

0 Kudos
18 Replies
MathewCoyle
Frequent Contributor
What version of python are you using? 2.7 should be able to open xlsx files fine.

Edit: xlrd 0.8.0+ support xlsx.

If you are asking about writing to xlsx you'd need to find another module currently. openpyxl seems preferred.
0 Kudos
PaulHuffman
Occasional Contributor III
I'm using Python 2.7.2. 

When I try the simple example on page 10 with an xlsx file in the same folder as the script, I get a lot of red returned:
from xlrd import open_workbook,XL_CELL_TEXT
book = open_workbook('simple.xlsx')
sheet = book.sheet_by_index(1)
cell = sheet.cell(0,0)
print cell
print cell.value
print cell.ctype==XL_CELL_TEXT
for i in range(sheet.ncols):
    print sheet.cell_type(1,i),sheet.cell_value(1,i)


Traceback (most recent call last):
  File "C:\Users\Paul\Documents\pyscripts\Office\cell_access.py", line 2, in <module>
    book = open_workbook('simple.xlsx')
  File "C:\Python27\ArcGIS10.1\lib\site-packages\xlrd\__init__.py", line 429, in open_workbook
    biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
  File "C:\Python27\ArcGIS10.1\lib\site-packages\xlrd\__init__.py", line 1545, in getbof
    bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
  File "C:\Python27\ArcGIS10.1\lib\site-packages\xlrd\__init__.py", line 1539, in bof_error
    raise XLRDError('Unsupported format, or corrupt file: ' + msg)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found 'PK\x03\x04\x14\x00\x06\x00'

If I try the script on an xls file, I get a lot less red.  But I thought this worked for we on xls files last week.

Traceback (most recent call last):
  File "C:\Users\Paul\Documents\pyscripts\Office\cell_access.py", line 4, in <module>
    cell = sheet.cell(0,0)
  File "C:\Python27\ArcGIS10.1\lib\site-packages\xlrd\sheet.py", line 255, in cell
    self._cell_types[rowx][colx],
IndexError: list index out of range
0 Kudos
MathewCoyle
Frequent Contributor
A possible answer for the error you are getting on SO.

http://stackoverflow.com/questions/16504975/error-unsupported-format-or-corrupt-file-expected-bof-re...

Double check your xlrd version and test on a simple xlsx file.
0 Kudos
PaulHuffman
Occasional Contributor III
When I tried the simple file opening routine on page 7 it seemed d to work with xls files:
from mmap import mmap,ACCESS_READ
from xlrd import open_workbook
print open_workbook('simple.xls')
with open('simple.xls','rb') as f:
    print open_workbook(
        file_contents=mmap(f.fileno(),0,access=ACCESS_READ)
        )
aString = open('simple.xls','rb').read()
print open_workbook(file_contents=aString)


>>>
<xlrd.Book object at 0x02950390>
<xlrd.Book object at 0x029211D0>
<xlrd.Book object at 0x029C19B0>
>>>


But seemed to blow up when I tried a xlsx file:
Traceback (most recent call last):
  File "C:\Users\Paul\Documents\pyscripts\Office\open.py", line 3, in <module>
    print open_workbook('simple.xlsx')
  File "C:\Python27\ArcGIS10.1\lib\site-packages\xlrd\__init__.py", line 429, in open_workbook
    biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
  File "C:\Python27\ArcGIS10.1\lib\site-packages\xlrd\__init__.py", line 1545, in getbof
    bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
  File "C:\Python27\ArcGIS10.1\lib\site-packages\xlrd\__init__.py", line 1539, in bof_error
    raise XLRDError('Unsupported format, or corrupt file: ' + msg)
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found 'PK\x03\x04\x14\x00\x06\x00'
0 Kudos
PaulHuffman
Occasional Contributor III


Double check your xlrd version and test on a simple xlsx file.


I'm using xlrd 0.9.2.  Is there revision beyond that?
0 Kudos
PaulHuffman
Occasional Contributor III
There's another simple example at http://codingtutorials.co.uk/python-excel-xlrd-xlwt/  If I run it on xls files the print.sheet_value doesn't seem to print anything. Wait, yes it works, I just happened to hit an empty cell.  But when I try to read a xlsx file, it really blows up.

There's a example on this site to create a workbook, and that runs, creates a xls file with a bunch of numbers in it.
0 Kudos
PaulHuffman
Occasional Contributor III
People at SO say that xlrd currently works with the .xlsx extension, http://stackoverflow.com/questions/4371163/reading-xlsx-files-using-python but that may not include the Excel 2013 files.

But there is openpyxl https://bitbucket.org/ericgazoni/openpyxl/ that I will try tomorrow.
0 Kudos
PaulHuffman
Occasional Contributor III
It doesn't look promising. The openpyxl webpage http://openpyxl.readthedocs.org/en/latest/#getting-the-source mentions only Excel 2007 xlsx files. 

Plus I'm having difficulty installing openpyxl in Windows and Python 2.7.  With cmd python setup.py install, I got "No Module named setuptools" (see attached image) [ATTACH=CONFIG]26177[/ATTACH]

Then I tried to install setuptools 0.9.7 https://pypi.python.org/pypi/setuptools/0.9.7#windows but on running ez_setup.py I got another error:
Downloading https://pypi.python.org/packages/source/s/setuptools/setuptools-0.9.7.tar.gz
Extracting in c:\users\paul\appdata\local\temp\tmptm3fqf
Now working in c:\users\paul\appdata\local\temp\tmptm3fqf\setuptools-0.9.7
Installing Setuptools
Something went wrong during the installation.
See the error message above.

Traceback (most recent call last):
  File "C:\Users\Paul\Documents\pyscripts\ez_setup.py", line 264, in <module>
    sys.exit(main())
SystemExit: 2
0 Kudos
PaulHuffman
Occasional Contributor III
Today I thought I'd try adding an environmental variable %PYTHON_PATH% and append it to %PATH%, but I got the same messages from both of the steps above.
0 Kudos