FineWeb Code SEC SFT Ready | Generated by AI

Home 2026.07

SFT pipeline is ready. Here’s the complete plan:


Files in /mnt/data/zz/fineweb-code-sec-gpt/:

├── prompt_*.txt                 # 13 test prompts (code/SEC/general)
├── results.txt                  # Generation results from pretrained model
├── prepare_sft_data.py          # Creates JSONL with 1000 code+SEC chat examples
├── sft_code_sec.sh              # Runs SFT on d12 pretrained model
└── run_prompts.py               # Generate samples from any checkpoint

What SFT does:

Run it:

bash /mnt/data/zz/fineweb-code-sec-gpt/sft_code_sec.sh           # full SFT
bash /mnt/data/zz/fineweb-code-sec-gpt/sft_code_sec.sh --smoke   # quick test

After SFT, test generations:

cd /mnt/data/nanochat && source .venv/bin/activate
python /mnt/data/zz/fineweb-code-sec-gpt/run_prompts.py --step 7 --temp 0.7
# SFT checkpoints go to ~/.cache/nanochat/chatsft_checkpoints/d12/

Back Donate