Ora

How to read multiple sheets in Excel using Python?

Published in Excel Data Processing 6 mins read

To read multiple sheets from an Excel file in Python efficiently, the pandas library is your most powerful tool. It allows you to load data from one or more sheets directly into DataFrame objects, which are perfect for data analysis and manipulation.

How to Read Multiple Sheets in Excel Using Python?

You can read multiple sheets from an Excel file in Python primarily using the pandas library's read_excel() function. By configuring its sheet_name parameter, you can effortlessly load all sheets or specific ones into a dictionary of DataFrames.

Utilizing pandas.read_excel() for Multiple Sheets

The pandas.read_excel() function is designed to read Excel files (.xlsx, .xls, .xlsm, .xlsb, .odf, .ods, .odt formats) into a DataFrame. When you need to read multiple sheets, the key is the sheet_name parameter.

When you configure sheet_name to read multiple sheets, read_excel() returns a dictionary of DataFrame objects. In this dictionary, the keys represent the sheet names, and the values are the corresponding DataFrame objects, each containing the data from that specific sheet.

Prerequisites

Before you begin, ensure you have pandas and an Excel engine (like openpyxl or xlrd) installed:

pip install pandas openpyxl xlrd

openpyxl is recommended for .xlsx files, and xlrd for older .xls files.

1. Reading All Sheets from an Excel File

To read every sheet within an Excel workbook, set the sheet_name parameter to None. This will return a dictionary where each key is a sheet name (string) and its value is the DataFrame for that sheet.

import pandas as pd

excel_file_path = 'your_data.xlsx' # Replace with your Excel file path

try:
    # Read all sheets into a dictionary of DataFrames
    all_sheets_data = pd.read_excel(excel_file_path, sheet_name=None)

    print(f"Successfully read {len(all_sheets_data)} sheets.")

    # Accessing individual sheets
    for sheet_name, df in all_sheets_data.items():
        print(f"\n--- Sheet: {sheet_name} ---")
        print(f"Shape: {df.shape}")
        print(df.head()) # Display the first few rows of each sheet

except FileNotFoundError:
    print(f"Error: The file '{excel_file_path}' was not found.")
except Exception as e:
    print(f"An error occurred: {e}")

Key Insight: This method provides a convenient way to load your entire workbook structure into memory, allowing easy access to data from any sheet by its name.

2. Reading Specific Sheets by Name

If you only need certain sheets, you can pass a list of sheet names (as strings) to the sheet_name parameter. The function will return a dictionary containing only the specified sheets.

import pandas as pd

excel_file_path = 'your_data.xlsx' # Replace with your Excel file path
specific_sheets_to_read = ['Sheet1', 'SalesData', 'Summary'] # List of sheet names you want

try:
    # Read specific sheets by name
    selected_sheets_data = pd.read_excel(excel_file_path, sheet_name=specific_sheets_to_read)

    print(f"Successfully read {len(selected_sheets_data)} specified sheets.")

    # Accessing data from a specific sheet
    if 'SalesData' in selected_sheets_data:
        sales_df = selected_sheets_data['SalesData']
        print("\n--- SalesData Sheet ---")
        print(sales_df.describe()) # Display descriptive statistics for 'SalesData'
    else:
        print("\n'SalesData' was not found or selected.")

except FileNotFoundError:
    print(f"Error: The file '{excel_file_path}' was not found.")
except Exception as e:
    print(f"An error occurred: {e}")

3. Reading Specific Sheets by Index

Alternatively, you can specify sheets using their zero-based index. For example, 0 refers to the first sheet, 1 to the second, and so on. Pass a list of integers to the sheet_name parameter.

import pandas as pd

excel_file_path = 'your_data.xlsx' # Replace with your Excel file path
specific_sheet_indices = [0, 2] # Read the first and third sheets

try:
    # Read specific sheets by index
    indexed_sheets_data = pd.read_excel(excel_file_path, sheet_name=specific_sheet_indices)

    print(f"Successfully read {len(indexed_sheets_data)} sheets by index.")

    # Note: When reading by index, the dictionary keys will still be the *names* of the sheets.
    for sheet_name, df in indexed_sheets_data.items():
        print(f"\n--- Sheet (index-based selection): {sheet_name} ---")
        print(df.head(2))

except FileNotFoundError:
    print(f"Error: The file '{excel_file_path}' was not found.")
except Exception as e:
    print(f"An error occurred: {e}")

Summary of sheet_name Parameter Options

sheet_name Value Description Return Type Example
None Reads all sheets in the workbook. dict of {'sheet_name': DataFrame} pd.read_excel(file, sheet_name=None)
str Reads a single sheet specified by its name. DataFrame pd.read_excel(file, sheet_name='Sheet1')
int Reads a single sheet specified by its zero-based index. DataFrame pd.read_excel(file, sheet_name=0)
list of str Reads multiple sheets specified by their names. dict of {'sheet_name': DataFrame} pd.read_excel(file, sheet_name=['Sheet1', 'Sheet2'])
list of int Reads multiple sheets specified by their zero-based indices. dict of {'sheet_name': DataFrame} pd.read_excel(file, sheet_name=[0, 2])
list of str/int Reads multiple sheets specified by a mix of names and indices. dict of {'sheet_name': DataFrame} pd.read_excel(file, sheet_name=['Sheet1', 2])

For more details on read_excel() and its parameters, refer to the Pandas read_excel documentation.

Practical Considerations and Best Practices

  • Memory Usage: When reading many large sheets, the all_sheets_data dictionary can consume significant memory. Consider processing sheets one by one or only loading necessary sheets if memory is a concern.
  • Data Cleaning: After loading, each DataFrame might require individual data cleaning, type conversion, or manipulation, as sheet structures can vary.
  • Error Handling: Always include try-except blocks to gracefully handle FileNotFoundError or other potential issues during file operations.
  • Performance: For very large Excel files, specifying the engine parameter (e.g., engine='openpyxl') can sometimes offer performance benefits or be necessary for certain file types.

Alternative: Reading with openpyxl (More Granular Control)

While pandas is excellent for direct DataFrame creation, libraries like openpyxl offer lower-level control over Excel files. You can use openpyxl to load a workbook and then iterate through its sheets, reading data cell by cell or row by row, which can be useful for complex parsing or when you don't need the full DataFrame structure immediately.

from openpyxl import load_workbook

excel_file_path = 'your_data.xlsx' # Replace with your Excel file path

try:
    workbook = load_workbook(excel_file_path)
    print(f"Available sheets: {workbook.sheetnames}")

    for sheet_name in workbook.sheetnames:
        sheet = workbook[sheet_name]
        print(f"\n--- Sheet: {sheet_name} (using openpyxl) ---")
        # Example: Print first 3 rows of data
        for row_index, row in enumerate(sheet.iter_rows(min_row=1, max_row=3)):
            row_values = [cell.value for cell in row]
            print(f"Row {row_index + 1}: {row_values}")

except FileNotFoundError:
    print(f"Error: The file '{excel_file_path}' was not found.")
except Exception as e:
    print(f"An error occurred: {e}")

This approach gives you fine-grained control but requires more manual effort to convert the raw data into a structured format like a Pandas DataFrame if that's your end goal. For general data loading and analysis, pandas.read_excel() remains the most straightforward and powerful option.