Reading and Writing CSV Files | Generated by AI
Introduction to Python’s csv
Module
The csv
module is part of Python’s standard library and is used for reading and writing CSV (Comma-Separated Values) files. It handles parsing rows and columns efficiently. To use it, first import the module: import csv
. You’ll work with file objects, typically opened in read mode ('r'
) or write mode ('w'
).
Key components:
- Reader: Parses CSV data from a file (e.g.,
csv.reader()
for row-based access). - Writer: Outputs data to a CSV file (e.g.,
csv.writer()
). - CSV files are treated as sequences of rows, where each row is a list of strings (columns).
For security and ease, always handle files with with
statements to ensure proper closing.
Basic Reading a CSV File
To read a CSV:
import csv
with open('file.csv', 'r') as file:
reader = csv.reader(file)
for row in reader:
print(row) # Each 'row' is a list of columns
- This reads the file row by row. You can access specific columns by index (e.g.,
row[0]
for the first column). - For headers, read the first row separately:
headers = next(reader)
.
Comparing Two CSV Files: Rows and Columns
To compare two CSVs (e.g., file1.csv
and file2.csv
), load them into structures like lists of lists (rows), then compare. Assumptions: both CSVs have the same structure (same number of columns/rows). Comparisons can check for exact matches, differences, or specific logic (e.g., matching on a key column).
Example 1: Comparing Rows (Entire Rows)
Use dictionaries to store rows (if they have a unique ID column) or lists for direct comparison.
import csv
def compare_rows(file1, file2, key_column=0):
# Read file1 into a dict (using key_column as key, whole row as value)
data1 = {}
with open(file1, 'r') as f1:
reader1 = csv.reader(f1)
headers1 = next(reader1, None) # Skip header if present
for row in reader1:
data1[row[key_column]] = row # e.g., key on first column
# Read file2 similarly
data2 = {}
with open(file2, 'r') as f2:
reader2 = csv.reader(f2)
headers2 = next(reader2, None)
for row in reader2:
data2[row[key_column]] = row
# Compare
common_keys = set(data1.keys()) & set(data2.keys())
differing_rows = []
for key in common_keys:
if data1[key] != data2[key]:
differing_rows.append((key, data1[key], data2[key]))
return differing_rows # List of (key, row_from_file1, row_from_file2)
# Usage
differences = compare_rows('file1.csv', 'file2.csv', key_column=0) # Key on column 0
print("Differing rows:", differences)
- How it works: Converts CSVs to dictionaries keyed by a column (e.g., ID). Compares matching rows directly. Adjust
key_column
to specify which column to key on. - Variations: For row-by-row comparison without keys, iterate both readers simultaneously (if same order/length).
Example 2: Comparing Columns
Compare specific columns across the entire file (e.g., check if column 1 values are identical in both files).
import csv
def compare_columns(file1, file2, col_index=0):
# Extract column data as lists
col1 = []
with open(file1, 'r') as f1:
reader1 = csv.reader(f1)
next(reader1, None) # Skip header if needed
for row in reader1:
if len(row) > col_index:
col1.append(row[col_index])
col2 = []
with open(file2, 'r') as f2:
reader2 = csv.reader(f2)
next(reader2, None)
for row in reader2:
if len(row) > col_index:
col2.append(row[col_index])
# Compare columns
are_equal = col1 == col2
differences = []
for i, (val1, val2) in enumerate(zip(col1, col2)):
if val1 != val2:
differences.append((i, val1, val2))
return are_equal, differences # Bool for match, list of (row_index, val1, val2)
# Usage
equal, diffs = compare_columns('file1.csv', 'file2.csv', col_index=1) # Compare column 1
print("Columns equal?", equal)
print("Differences (row, val1, val2):", diffs)
- How it works: Extracts the specified column into lists, then compares element-wise. Assumes CSVs have the same number of rows; handle mismatches if needed.
- Variations: Use
pandas
for more advanced column operations (e.g.,df1['column'] == df2['column']
), but stick tocsv
for pure standard library.
Tips and Edge Cases
- Headers: If CSVs have headers, skip them with
next(reader)
. - Data Types: CSV reads as strings; convert to int/float if needed (e.g.,
int(row[0])
). - Large Files: For big CSVs, use DictReader for keyed access:
csv.DictReader(file)
returns rows as OrderedDicts. - Delimiter: Use
delimiter=';'
in reader/writer if not comma-separated. - Errors: Wrap in try-except for file issues; validate column lengths to avoid index errors.
- Alternatives: For complex comparisons, consider libraries like
pandas
(df1.equals(df2)
for dataframes).
For full docs, check Python’s official csv module guide. If you have specific CSV files or comparison logic, provide more details for tailored code!