Reading and Writing CSV Files | Generated by AI

Home 2025.10

Introduction to Python’s csv Module

The csv module is part of Python’s standard library and is used for reading and writing CSV (Comma-Separated Values) files. It handles parsing rows and columns efficiently. To use it, first import the module: import csv. You’ll work with file objects, typically opened in read mode ('r') or write mode ('w').

Key components:

For security and ease, always handle files with with statements to ensure proper closing.

Basic Reading a CSV File

To read a CSV:

import csv

with open('file.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)  # Each 'row' is a list of columns

Comparing Two CSV Files: Rows and Columns

To compare two CSVs (e.g., file1.csv and file2.csv), load them into structures like lists of lists (rows), then compare. Assumptions: both CSVs have the same structure (same number of columns/rows). Comparisons can check for exact matches, differences, or specific logic (e.g., matching on a key column).

Example 1: Comparing Rows (Entire Rows)

Use dictionaries to store rows (if they have a unique ID column) or lists for direct comparison.

import csv

def compare_rows(file1, file2, key_column=0):
    # Read file1 into a dict (using key_column as key, whole row as value)
    data1 = {}
    with open(file1, 'r') as f1:
        reader1 = csv.reader(f1)
        headers1 = next(reader1, None)  # Skip header if present
        for row in reader1:
            data1[row[key_column]] = row  # e.g., key on first column

    # Read file2 similarly
    data2 = {}
    with open(file2, 'r') as f2:
        reader2 = csv.reader(f2)
        headers2 = next(reader2, None)
        for row in reader2:
            data2[row[key_column]] = row

    # Compare
    common_keys = set(data1.keys()) & set(data2.keys())
    differing_rows = []
    for key in common_keys:
        if data1[key] != data2[key]:
            differing_rows.append((key, data1[key], data2[key]))
    
    return differing_rows  # List of (key, row_from_file1, row_from_file2)

# Usage
differences = compare_rows('file1.csv', 'file2.csv', key_column=0)  # Key on column 0
print("Differing rows:", differences)

Example 2: Comparing Columns

Compare specific columns across the entire file (e.g., check if column 1 values are identical in both files).

import csv

def compare_columns(file1, file2, col_index=0):
    # Extract column data as lists
    col1 = []
    with open(file1, 'r') as f1:
        reader1 = csv.reader(f1)
        next(reader1, None)  # Skip header if needed
        for row in reader1:
            if len(row) > col_index:
                col1.append(row[col_index])

    col2 = []
    with open(file2, 'r') as f2:
        reader2 = csv.reader(f2)
        next(reader2, None)
        for row in reader2:
            if len(row) > col_index:
                col2.append(row[col_index])

    # Compare columns
    are_equal = col1 == col2
    differences = []
    for i, (val1, val2) in enumerate(zip(col1, col2)):
        if val1 != val2:
            differences.append((i, val1, val2))
    
    return are_equal, differences  # Bool for match, list of (row_index, val1, val2)

# Usage
equal, diffs = compare_columns('file1.csv', 'file2.csv', col_index=1)  # Compare column 1
print("Columns equal?", equal)
print("Differences (row, val1, val2):", diffs)

Tips and Edge Cases

For full docs, check Python’s official csv module guide. If you have specific CSV files or comparison logic, provide more details for tailored code!


Back

x-ai/grok-code-fast-1

Donate