Ora

How to read CSV in Python?

Published in Python CSV 6 mins read

Reading CSV files in Python is a common and straightforward task, achievable using either the built-in csv module for basic operations or the powerful pandas library for more complex data handling and analysis.

Using Python's Built-in csv Module

Python's csv module is a standard library that provides robust functionality for parsing CSV (Comma Separated Values) files. It's excellent for reading data row by row, especially when you need fine-grained control or are working with simpler datasets.

1. Reading CSV Files with csv.reader

The csv.reader object iterates over lines in the CSV file, returning each line as a list of strings. This method is suitable when you need to process data sequentially by row and column index.

Steps to read a CSV file using csv.reader:

  1. Import the csv library: Begin by importing the necessary module.
    import csv
  2. Open the CSV file: Use a with open() statement to open your CSV file. This ensures the file is automatically closed after its block is exited, even if errors occur. The newline='' argument is crucial to prevent common issues like extra blank rows in the output.
    with open('your_file.csv', 'r', newline='') as file:
        # File operations go here
  3. Create a csv.reader object: Pass the opened file object to csv.reader().
    csvreader = csv.reader(file)
  4. Extract the header (field names): Typically, the first row contains the column headers. You can read this row separately using next().
    header = next(csvreader)
    print(f"Header: {header}")
  5. Extract the data rows (records): Iterate through the csvreader object to access each subsequent row. Each row will be returned as a list of strings.
    rows = []
    for row in csvreader:
        rows.append(row)
    print(f"Data Rows: {rows}")

Example:

Let's assume you have a file named inventory.csv with the following content:

Item ID,Name,Quantity,Price
A001,Pen,150,1.25
A002,Notebook,80,3.50
A003,Eraser,200,0.75
import csv

def read_csv_with_reader(filename):
    header = []
    data_rows = []
    try:
        with open(filename, 'r', newline='', encoding='utf-8') as file:
            csvreader = csv.reader(file)
            header = next(csvreader)  # Reads the first row as the header
            for row in csvreader:
                data_rows.append(row) # Appends subsequent rows as data
        print(f"Header: {header}")
        print("\nData Rows:")
        for row in data_rows:
            print(row)
    except FileNotFoundError:
        print(f"Error: The file '{filename}' was not found.")
    except Exception as e:
        print(f"An error occurred: {e}")

# (Optional) Create a dummy CSV file for demonstration
with open('inventory.csv', 'w', newline='') as f:
    f.write("Item ID,Name,Quantity,Price\n")
    f.write("A001,Pen,150,1.25\n")
    f.write("A002,Notebook,80,3.50\n")
    f.write("A003,Eraser,200,0.75\n")

read_csv_with_reader('inventory.csv')

2. Reading CSV Files with csv.DictReader

csv.DictReader is a more convenient option when your CSV file has a header row that you want to use as keys for your data. It reads each row as a dictionary, where column headers are keys and the row's values are their corresponding values.

Example using csv.DictReader:

Using the same inventory.csv file:

import csv

def read_csv_with_dictreader(filename):
    data_dicts = []
    try:
        with open(filename, 'r', newline='', encoding='utf-8') as file:
            csv_dict_reader = csv.DictReader(file)
            # The fieldnames (header) are automatically detected and available
            print(f"Header (DictReader): {csv_dict_reader.fieldnames}")
            for row_dict in csv_dict_reader:
                data_dicts.append(row_dict)
        print("\nData as Dictionaries:")
        for row_dict in data_dicts:
            print(row_dict)
            # You can access data by column name, e.g.:
            # print(f"Item: {row_dict['Name']}, Quantity: {row_dict['Quantity']}")
    except FileNotFoundError:
        print(f"Error: The file '{filename}' was not found.")
    except Exception as e:
        print(f"An error occurred: {e}")

read_csv_with_dictreader('inventory.csv')

Key Advantages of csv.DictReader:

  • Readability: Access data using intuitive column names (e.g., row_dict['Name']) rather than less descriptive numeric indices (e.g., row[1]).
  • Maintainability: Your code is more robust to changes in column order, as long as the column names remain consistent.

Comparison: csv.reader vs. csv.DictReader

Feature csv.reader csv.DictReader
Row Format List of strings Dictionary (keys from header, values from row)
Access Data By Numeric index (e.g., row[0]) Column name (e.g., row_dict['Column Name'])
Header Handling Manual extraction with next(reader) Automatic (header becomes keys), accessible via .fieldnames
Best For Simple positional data, memory efficiency Data with clear headers, easier data manipulation

For more comprehensive details on the csv module, refer to the official Python documentation.

When to Use Pandas for CSV Files

For larger datasets, advanced data manipulation, or integration with data analysis workflows, the pandas library is the de facto standard in Python. It provides highly optimized data structures like DataFrame which significantly simplify reading, cleaning, transforming, and analyzing tabular data.

To use pandas, you first need to install it: pip install pandas.

Example using Pandas:

import pandas as pd

def read_csv_with_pandas(filename):
    try:
        # Read the CSV file directly into a DataFrame
        df = pd.read_csv(filename)

        print("\nDataFrame Info:")
        df.info()

        print("\nFirst 5 Rows of the DataFrame:")
        print(df.head())

        # Accessing data (e.g., selecting a column)
        print(f"\nNames of items: {df['Name'].tolist()}")

    except FileNotFoundError:
        print(f"Error: The file '{filename}' was not found.")
    except ImportError:
        print("Error: pandas is not installed. Please install it using 'pip install pandas'.")
    except Exception as e:
        print(f"An error occurred: {e}")

read_csv_with_pandas('inventory.csv')

Key Advantages of Pandas for CSV Reading:

  • Efficiency: Pandas is highly optimized for performance with large datasets.
  • DataFrames: Provides a powerful, intuitive tabular data structure, making data manipulation and analysis much easier.
  • Built-in features: Offers automatic data type inference, robust handling of missing values, and a vast array of functions for data cleaning, transformation, and statistical analysis.
  • Flexibility: Supports numerous parameters for handling diverse CSV formats (custom separators, encoding, specific missing value indicators, etc.).

For comprehensive documentation on the pandas library, visit the official pandas website.