Reading and Writing CSV Files

Reading and Writing CSV Files | Generated by AI

Home 2025.10

Introduction to Python’s `csv` Module

The csv module is part of Python’s standard library and is used for reading and writing CSV (Comma-Separated Values) files. It handles parsing rows and columns efficiently. To use it, first import the module: import csv. You’ll work with file objects, typically opened in read mode ('r') or write mode ('w').

Key components:

Reader: Parses CSV data from a file (e.g., csv.reader() for row-based access).
Writer: Outputs data to a CSV file (e.g., csv.writer()).
CSV files are treated as sequences of rows, where each row is a list of strings (columns).

For security and ease, always handle files with with statements to ensure proper closing.

Basic Reading a CSV File

To read a CSV:

import csv

with open('file.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)  # Each 'row' is a list of columns

This reads the file row by row. You can access specific columns by index (e.g., row[0] for the first column).
For headers, read the first row separately: headers = next(reader).

Comparing Two CSV Files: Rows and Columns

To compare two CSVs (e.g., file1.csv and file2.csv), load them into structures like lists of lists (rows), then compare. Assumptions: both CSVs have the same structure (same number of columns/rows). Comparisons can check for exact matches, differences, or specific logic (e.g., matching on a key column).

Example 1: Comparing Rows (Entire Rows)

Use dictionaries to store rows (if they have a unique ID column) or lists for direct comparison.

import csv

def compare_rows(file1, file2, key_column=0):
    # Read file1 into a dict (using key_column as key, whole row as value)
    data1 = {}
    with open(file1, 'r') as f1:
        reader1 = csv.reader(f1)
        headers1 = next(reader1, None)  # Skip header if present
        for row in reader1:
            data1[row[key_column]] = row  # e.g., key on first column

    # Read file2 similarly
    data2 = {}
    with open(file2, 'r') as f2:
        reader2 = csv.reader(f2)
        headers2 = next(reader2, None)
        for row in reader2:
            data2[row[key_column]] = row

    # Compare
    common_keys = set(data1.keys()) & set(data2.keys())
    differing_rows = []
    for key in common_keys:
        if data1[key] != data2[key]:
            differing_rows.append((key, data1[key], data2[key]))
    
    return differing_rows  # List of (key, row_from_file1, row_from_file2)

# Usage
differences = compare_rows('file1.csv', 'file2.csv', key_column=0)  # Key on column 0
print("Differing rows:", differences)

How it works: Converts CSVs to dictionaries keyed by a column (e.g., ID). Compares matching rows directly. Adjust key_column to specify which column to key on.
Variations: For row-by-row comparison without keys, iterate both readers simultaneously (if same order/length).

Example 2: Comparing Columns

Compare specific columns across the entire file (e.g., check if column 1 values are identical in both files).

import csv

def compare_columns(file1, file2, col_index=0):
    # Extract column data as lists
    col1 = []
    with open(file1, 'r') as f1:
        reader1 = csv.reader(f1)
        next(reader1, None)  # Skip header if needed
        for row in reader1:
            if len(row) > col_index:
                col1.append(row[col_index])

    col2 = []
    with open(file2, 'r') as f2:
        reader2 = csv.reader(f2)
        next(reader2, None)
        for row in reader2:
            if len(row) > col_index:
                col2.append(row[col_index])

    # Compare columns
    are_equal = col1 == col2
    differences = []
    for i, (val1, val2) in enumerate(zip(col1, col2)):
        if val1 != val2:
            differences.append((i, val1, val2))
    
    return are_equal, differences  # Bool for match, list of (row_index, val1, val2)

# Usage
equal, diffs = compare_columns('file1.csv', 'file2.csv', col_index=1)  # Compare column 1
print("Columns equal?", equal)
print("Differences (row, val1, val2):", diffs)

How it works: Extracts the specified column into lists, then compares element-wise. Assumes CSVs have the same number of rows; handle mismatches if needed.
Variations: Use pandas for more advanced column operations (e.g., df1['column'] == df2['column']), but stick to csv for pure standard library.

Tips and Edge Cases

Headers: If CSVs have headers, skip them with next(reader).
Data Types: CSV reads as strings; convert to int/float if needed (e.g., int(row[0])).
Large Files: For big CSVs, use DictReader for keyed access: csv.DictReader(file) returns rows as OrderedDicts.
Delimiter: Use delimiter=';' in reader/writer if not comma-separated.
Errors: Wrap in try-except for file issues; validate column lengths to avoid index errors.
Alternatives: For complex comparisons, consider libraries like pandas (df1.equals(df2) for dataframes).

For full docs, check Python’s official csv module guide. If you have specific CSV files or comparison logic, provide more details for tailored code!

Back

x-ai/grok-code-fast-1

Donate