# Read excel files (xlsx) with Python

If there is something that you can do with programming languages as Python is to automatize boring stuff. That’s why is interesting to make Python talk with other programs as Microsoft Excel. In this post I’m going to show how to read the values from an Excel spreadsheet and modify the values with Python.

**Step 0. What you should already have installed:**

I’m going to assume that you already have installed Python 2.7, pip, numpy and finally the matplotlib library if you want to do the optional step 4). If you don’t have all of this installed you can check this post on how to do it.

**Step 1. Install openpyxl:**

To work with excel files we require to download a library called openpyxl. This library was created to work with .xlsx files, that is, files from Excel 2007 onwards. This method will not work with .xls files. As usual, there are different ways to install libraries in python but the easiest way to install openpyxl is to go to the command prompt window and write:

`pip install openpyxl`

Press enter and the openpyxl library will be installed.

**Step 2. Create the excel file you are going to read:**

We need an Excel file to read, so open an Excel spreadsheet and enter the same data as in the following image:

In column A we write numbers from 1 to 10 and in column B we write the square of the corresponding number. Save the file now, you can use any name, but if you want to use the exact same code, save it with the name ‘python_excel_read’. Remember to keep the type as ‘Excel woorkbook’ (should be the type by default)

**Step 3. Read the cells from the excel file:**

Now we move to Python to read the values from each cell and we are to store it in two variables x and y. The code will look as this:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
#import required libraries from openpyxl import load_workbook import numpy as np #read from excel file wb = load_workbook('python_excel_read.xlsx') sheet_1 = wb.get_sheet_by_name('Sheet1') x = np.zeros(sheet_1.max_row) y = np.zeros(sheet_1.max_row) for i in range(0,sheet_1.max_row): x[i]=sheet_1.cell(row=i+1, column=1).value y[i]=sheet_1.cell(row=i+1, column=2).value print x print y |

Copy or write this code and remember to save the Python file in the same folder than the Excel file so the program will be able to find it. Let’s start with the code now: we import *load_wordbook* from openpyxl and the numpy library. The first library contains the function we are going to use to open the excel file and the second one will be use to generate an empty vectors where we will store our values.

We use now *load_workbook(‘filename’)* to open the Excel file. If you didn’t save the Excel file with the same name as myself, remember to put the name you used with the .xlsx extension. As you are probably aware, each Excel sheet has it own name, by default is Sheet1, Sheet2, Sheet3… In this example we are using the first one so we save the Sheet name on the variable sheet_1 using *get_sheet_by_name(‘sheetname’)*.

Now we need to create empty vectors to store our values. There are different ways of doing this, I use numpy because when I read an Excel file I expect to do any kind of mathematical operation with the values. To automatically get the size of the vectors, we use max_row. Be aware that in this case I’m going to save the values from each column, that is, ‘x’ will be column A and ‘y’ will be column B. If you want to save each row, you will have to swap columns and rows.

Now we need to do a for loop to go through all the cells. As you can see, we go from 0 to the highest row. To get the values from each cell, we use *cell(row,column).value*. Remember that the first index in Python is 0, so in the first value of vector ‘x’, that is x[0] , you want to store the value from the cell in row 1 and column 1. That’s why you need to add 1 to the row index. The instruction for ‘y’ is the same but we need to set the column number to the number 2, that is, column B in the Excel file.

Finally, we print both vectors which should have the values from the Excel file.

**Step 4 (Optional). Plot the values:**

Now you can play with the data as you usually do in Python, in this example, I’m going to plot it using matplotlib. You can find the previous code with the additional code required to plot the values:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
#import required libraries from openpyxl import load_workbook import numpy as np import matplotlib.pyplot as plt #read from excel file wb = load_workbook('python_excel_read.xlsx') sheet_1 = wb.get_sheet_by_name('Sheet1') x = np.zeros(sheet_1.max_row) y = np.zeros(len(x)) for i in range(0,sheet_1.max_row): x[i]=sheet_1.cell(row=i+1, column=1).value y[i]=(sheet_1.cell(row=i+1, column=2).value) print x print y #create the plot plt.xlabel('X values') plt.ylabel('Y values') plt.plot(x,y, 'bo-', label='Values') plt.legend(loc='upper left', fontsize='small') plt.grid(True) plt.xlim(0,11) plt.ylim(-9,110) plt.title('Reading values from an Excel file') plt.show() |

If you run this program, you will get the values as in the previous step and also the following plot:

its working..but with some warning as below .

what is it?

Warning (from warnings module):

File “C:\Python27\lib\site-packages\openpyxl\worksheet\worksheet.py”, line 350

def get_highest_row(self):

UserWarning: Call to deprecated function or class get_highest_row (Use the max_row property).

Warning (from warnings module):

File “C:\Python27\lib\site-packages\openpyxl\worksheet\worksheet.py”, line 350

def get_highest_row(self):

UserWarning: Call to deprecated function or class get_highest_row (Use the max_row property).

Warning (from warnings module):

File “C:\Python27\lib\site-packages\openpyxl\worksheet\worksheet.py”, line 350

def get_highest_row(self):

UserWarning: Call to deprecated function or class get_highest_row (Use the max_row property).

These warnings come from some recent changes in openpyxl. The warnings just tell you that there is available a property that does (more or less) the same. For example, the first warning tells you that you are using get_highest_row and that now you can/should use max_row.

So to mantain compatibility for future developments of openpyxl, it is probably worth to check how the new property is used and change the lines in your program. Yet for the program itself, it is fine.

I’ve replaced using get_highest_row() by max_row. Now the program should not give you any warnings

Hai,Its works for me but I need an extra code that whenever i need to update in the Excel sheet,Then it should automatically should affect in the Plotted graph i.e dynamic in nature.

If you want to plot in real time with python, you can check this for help: http://www.toptechboy.com/tutorial/python-with-arduino-lesson-10-installing-drawnow-to-allow-live-plotting-with-matplotlib/

The problem to change a value and dynamically see the change in a plot is that it require the file to be open, and this code won’t work. One solution could be to do live plotting (as in the video) and keep saving the data on an excel file at the same time.

Hi silvinopresa

very nice Tuto,

could you please help me to change the code as follow?

my Excel has let say five columns,

i want to plot 5 graphics and save them in a folder

in each grapghic the x value will be the first column and the y value wil be 2,3,4,5

its mean

1 graphic

x = 1

y = 2

second graphic

X = 1

y = 3

then we save each plot in an folder.

thank u in advance.

germain