Download Speed During Whisper Model Re-download

Download Speed During Whisper Model Re-download | Generated by AI

Home 2026.05

Question: How do I understand the download speed shown in the Whisper large-v3 model re-download progress on an RTX 4070?

Answer:

What’s Actually Happening

What you’re seeing is not GPU transcription speed — it’s a model file re-download. Here’s why:

UserWarning: /home/lzw/.cache/whisper/large-v3.pt exists,
but the SHA256 checksum does not match; re-downloading the file

The previously cached large-v3.pt file is corrupted or incomplete, so Whisper is re-downloading it from the internet. The 6.42 MiB/s is your internet download speed, not GPU speed.

Breaking Down the Progress Bar

38%|████████████████                     | 1.09G/2.88G [01:45<04:57, 6.42MiB/s]

Field	Value	Meaning
`38%`	Progress	38% of the file downloaded
`1.09G / 2.88G`	Bytes	Downloaded 1.09 GB out of 2.88 GB total
`01:45`	Elapsed	1 min 45 sec elapsed
`04:57`	Remaining	~5 minutes left
`6.42 MiB/s`	Speed	~6.42 MB/s internet download speed

The large-v3 model file is 2.88 GB — that’s the full model weight checkpoint.

What to Do After Download Completes

Once the download finishes, Whisper will start actual transcription on your RTX 4070, which is a very capable GPU for this task. Benchmarks show the RTX 4070 has the best price-to-performance ratio for Whisper transcription among comparable GPUs.

You can expect roughly 15–30x real-time speed with large model on an RTX 4070 (e.g., a 60-minute audio file transcribed in 2–4 minutes).

How to Avoid This Problem Next Time

The checksum mismatch usually means the previous download was interrupted. To prevent re-downloads:

Let the download fully complete without interruption.
Or pre-download the model manually and verify integrity:

# Check if the file is complete after download
ls -lh ~/.cache/whisper/large-v3.pt
# Should be ~2.88 GB

Bonus: Use `faster-whisper` for Better Speed

If you want significantly faster transcription on your RTX 4070, consider switching to faster-whisper:

pip install faster-whisper

Faster-whisper uses CTranslate2 and can automatically download the model from Hugging Face Hub when loading by size, supporting float16 quantization for better GPU performance.

References:

Back Donate