Reading Excel Files with Pandas | Generated by AI
Prerequisites
To read Excel files in Python, you need the pandas
library and an Excel engine. Install pandas if not already installed:
pip install pandas
For modern Excel formats (.xlsx), install openpyxl
:
pip install openpyxl
For .xls files, use xlrd
:
pip install xlrd
Basic Usage
Pandas provides the pd.read_excel()
function to load Excel data into a DataFrame.
- Import pandas:
import pandas as pd
- Read the Excel file (specify the file path):
df = pd.read_excel('path/to/your/file.xlsx')
- This loads the first sheet by default.
- The result is a DataFrame containing the Excel data.
- Inspect the data:
print(df.head()) # View first 5 rows print(df.info()) # Summary of columns and data types
Advanced Options
- Specify a sheet: Use the
sheet_name
parameter (default is 0 for the first sheet):df = pd.read_excel('file.xlsx', sheet_name='Sheet2') # By name df = pd.read_excel('file.xlsx', sheet_name=1) # By index (0-based)
- Read multiple sheets: Pass a list or
None
to load all sheets as a dictionary:all_sheets = pd.read_excel('file.xlsx', sheet_name=None) # Dict of DataFrames keyed by sheet name
- Specify data range: Use
usecols
for columns (e.g.,'A:C'
for columns A to C) andskiprows
to skip initial rows:df = pd.read_excel('file.xlsx', usecols='A,C,E', skiprows=2)
- Handle headers: If the header is not in row 0, specify with
header
:df = pd.read_excel('file.xlsx', header=1) # Header in row 1 (0-based index)
- Set index: Use
index_col
to set a column as the index:df = pd.read_excel('file.xlsx', index_col='Column Name')
Example
Suppose you have an Excel file sales.xlsx
with a sheet named “Data” containing sales data.
import pandas as pd
# Read the specific sheet
df = pd.read_excel('sales.xlsx', sheet_name='Data')
# Display the DataFrame
print(df)
# Save to CSV if needed
df.to_csv('sales.csv', index=False)
For more details, refer to the pandas documentation on read_excel. If you encounter errors, check the Excel file format and ensure the required packages are installed.