Read excel files (xlsx) with Python



If there is something that you can do with programming languages as Python is to automatize boring stuff. That’s why is interesting to make Python talk with other programs as Microsoft Excel. In this post I’m going to show how to read the values from an Excel spreadsheet and modify the values with Python.

Step 0. What you should already have installed:
I’m going to assume that you already have installed Python 2.7, pip, numpy and finally the matplotlib library if you want to do the optional step 4). If you don’t have all of this installed you can check this post on how to do it.

Step 1. Install openpyxl:
To work with excel files we require to download a library called openpyxl. This library was created to work with .xlsx files, that is, files from Excel 2007 onwards. This method will not work with .xls files. As usual, there are different ways to install libraries in python but the easiest way to install openpyxl is to go to the command prompt window and write:
pip install openpyxl

Press enter and the openpyxl library will be installed.

Step 2. Create the excel file you are going to read:
We need an Excel file to read, so open an Excel spreadsheet and enter the same data as in the following image:

Excel spreadsheet with the data we are going to read with Python

In column A we write numbers from 1 to 10 and in column B we write the square of the corresponding number. Save the file now, you can use any name, but if you want to use the exact same code, save it with the name ‘python_excel_read’. Remember to keep the type as ‘Excel woorkbook’ (should be the type by default)

Step 3. Read the cells from the excel file:
Now we move to Python to read the values from each cell and we are to store it in two variables x and y. The code will look as this:

Copy or write this code and remember to save the Python file in the same folder than the Excel file so the program will be able to find it. Let’s start with the code now: we import load_wordbook from openpyxl and the numpy library. The first library contains the function we are going to use to open the excel file and the second one will be use to generate an empty vectors where we will store our values.

We use now load_workbook(‘filename’) to open the Excel file. If you didn’t save the Excel file with the same name as myself, remember to put the name you used with the .xlsx extension. As you are probably aware, each Excel sheet has it own name, by default is Sheet1, Sheet2, Sheet3… In this example we are using the first one so we save the Sheet name on the variable sheet_1 using get_sheet_by_name(‘sheetname’).

Now we need to create empty vectors to store our values. There are different ways of doing this, I use numpy because when I read an Excel file I expect to do any kind of mathematical operation with the values. To automatically get the size of the vectors, we use max_row. Be aware that in this case I’m going to save the values from each column, that is, ‘x’ will be column A and ‘y’ will be column B. If you want to save each row, you will have to swap columns and rows.

Now we need to do a for loop to go through all the cells. As you can see, we go from 0 to the highest row. To get the values from each cell, we use cell(row,column).value. Remember that the first index in Python is 0, so in the first value of vector ‘x’, that is x[0] , you want to store the value from the cell in row 1 and column 1. That’s why you need to add 1 to the row index. The instruction for ‘y’ is the same but we need to set the column number to the number 2, that is, column B in the Excel file.

Finally, we print both vectors which should have the values from the Excel file.

Step 4 (Optional). Plot the values:
Now you can play with the data as you usually do in Python, in this example, I’m going to plot it using matplotlib. You can find the previous code with the additional code required to plot the values:

If you run this program, you will get the values as in the previous step and also the following plot:

Values from the Excel file plotted with matplotlib
Values from the Excel file plotted with matplotlib