SPGISpeech Training Status Update | Generated by AI

Home 2026.07

Here’s the full status of the SPGISpeech financial audio model training:

Latest Commit

a503fb5 — just a .gitignore update (added spgispeech/checkpoints/ to ignored list). The actual training code lives in commit 2167e5c.

What Was Built

Scripts (both committed):

Architecture — Custom pyarrow-backed SPGISpeechDataset (no HF datasets Audio, avoids torchcodec dep hell):

Training config: whisper-small (244M params), effective batch 32, fp16, gradient checkpointing, eval+save every 500 steps, WER metric, TensorBoard logging.

Where Training Actually Stands

The training was started but killed after 10 steps. From the TensorBoard log:

Step Loss Grad Norm LR Epoch
5 2.1592 46.0 0.0 0.0005
10 2.3147 76.0 0.0 0.0010

What you have vs what you need to run:

spgispeech/
├── train_whisper.py       # committed
├── train_cmd.sh           # committed
├── explore.py             # committed
├── data/S/                # 12 parquet, 42 GB — fully present
├── .venv/                 # exists, deps installed
└── checkpoints/runs/      # 1 aborted tensorboard run (10 steps)

To Resume

cd /mnt/data/zz/spgispeech && source .venv/bin/activate

# In tmux:
tmux new-session -s whisper \; send-keys 'cd /mnt/data/zz/spgispeech && source .venv/bin/activate && python3 train_whisper.py --model small --batch-size 16 --grad-accum 2 --lr 1e-5 --epochs 3 --save-steps 500 --eval-steps 500 --logging-steps 50 --eval-samples 500 --test-samples 2000' Enter

The dataset, venv, and code are all ready — just needs to actually be launched and left running (~24-30h for 3 epochs). Want me to kick it off?


Back Donate