Parsing an excel sheet in Python platform independent

28.06.2007 by kerim in python

My wife had some trouble with excelsheets and scripts that had to be created from these sheets. In particular she had to filter the sheet and exclude certain values. Then from the resulting set she had to extract the data of one column only. And for each row in that column she had to create 4 commands that were inserted in a batch file. This batchfile had to be saved as a unix encoded file, transfered via ftp and then was used in another process.

Now i am certainly not a VBA guy.

My first attempt was to create a csv file, read and parse that in python, store the new textfile and voila. Thats pretty simple and it works in around 95.0 % of the cases. It only has a small but annoying problem. When you tell Excel to export things and in a cell you have a return or newline, then excel will export this as a new line as well. So data will be splitted. An example. If you have a structure that looks like this:


one normal cell |another cell | another normal cell
|containing a return |

then Excel will give you the following:
one normal cell;another cell
containing a <return>;another normal cell

As you can see it creates a newline at the position of the return in the second cell. Thats bad, really bad !
Another problem of course is that she always had to export her file first.
Now to solve that i simply implemented an Excelreader that does what she wants.
First i tried it with win32 but i had a small problem here myself: I have no excel on my notebook. So i had to develop things on my office pc. That didn’t really make me happy. The alternative is to use a platform independent module that doesn’t requitre Excel (or even windows). It took me around 1 hour to find something, to implement and test the code.
I used xlrd here.
So here is part of the code without much further comment.
I guess the rest is left up to you and your own specific needs. To try it just use an ExcelSheet with for example the following values:


Head A Head B Head C Head D
A2 B2 2 D2
A3 B3 TRUE D3

import xlrd,os  
#open the file: use os.path... if the file is in the same directory as the python program !  
book = xlrd.open_workbook( os.path.join(os.path.dirname(__file__), 'Mappe1.xls'))  
#how many sheets are there ?  
print book.nsheets   
#get me the first sheet (index starts with 0 of course)  
sheet=book.sheet_by_index(0)   
#get me the number of rows  
totalRows= sheet.nrows   
#we assume the column of interest (where the data is) is 2 (meaning the third visible one)  
columnOfInterest=2  
#we assume the sheet contains a header row (row=0) and the data starts with the next row  
startDataRow=1  
print totalRows  
for currentRow in range(1,totalRows):  
     #get me the complete row  
     print sheet.row(currentRow)  
     #get the value ... datatype is determined dynamically  
     value=sheet.cell_value(currentRow,columnOfInterest)  
     print value

#or do it the easy way just get the columns of interest to iterate over.  
columnslice=sheet.col_values(columnOfInterest,startDataRow,totalRows)  
for value in columnslice:  
    print value

Want content like this in your inbox each workday? No BS, spam or tricks... just useful content:

I understand and agree to the privacy policy