FineWeb Dataset Parquet File Count | Generated by AI

Home 2025.11

The FineWeb dataset (the 15T+ token English version at HuggingFaceFW/fineweb) contains approximately 35,000 to 50,000 .parquet files in total.

Breakdown

Your script downloads shards 11 through 20 (10 files) from the very small CC-MAIN-2013-20 dump, which has far more than that — many hundreds of shards in total for that dump alone.

If you want the exact count, you can compute it programmatically with the Hugging Face Hub API (e.g., using huggingface_hub.list_repo_files), but the rough numbers above are what people generally use when downloading or processing the full dataset.

FineWeb dataset card
FineWeb data tree (example dump)


Back

x-ai/grok-4.1-fast

Donate