SEC-EDGAR 10-K Download Summary | Generated by AI

Home 2026.06

SEC-EDGAR Dataset — Session Summary

Dataset: kapilrao/SEC-EDGAR on HuggingFace Source: SEC EDGAR filings (10-K, 10-Q, 144, 20-F, 3, 4, 5, 8-K, S-1, S-8) Total available: 2,551 shards, 274.8 GB


What we downloaded:

File format (parquet schema):

Corrupt shard:


Scripts created & pushed:

Script Purpose Commit
scripts/download/download_sec_edgar.py Download shards with size cap, skip cached, resume 378bcdb
scripts/download/view_sec_edgar.py View/list/search samples from parquet files 93138ec + 70f83b5
scripts/download/fix_corrupt_shard.sh Re-download the broken shard (useless now) d153ec5

Viewer commands:

python3.11 scripts/download/view_sec_edgar.py --list
python3.11 scripts/download/view_sec_edgar.py --sample
python3.11 scripts/download/view_sec_edgar.py --sample -n 3 --text-only --chars 500
python3.11 scripts/download/view_sec_edgar.py --search "risk factors"
python3.11 scripts/download/view_sec_edgar.py --file 10-K/<shard>.parquet --head 5

To get more data: increase --target-gb (274.8 GB available across all filing types).


Back Donate