Model Performance Analysis at Validation 1.42 BPB | Generated by AI

Home 2026.07

Done. Results saved to /mnt/data/zz/fineweb-code-sec-gpt/results.txt. Quick summary:

Observations — d12 (step 50k, val_bpb 1.418)

Domain Quality
Code (quicksort continuation) Good — continued quicksort correctly, generated sort_2, sort_n variants with realistic code structure
Code (React DataFetcher) Decent — reproduced the component faithfully, generated a fetch(url) function that reuses same pattern
Code (Go/merge sort/SQL) Weaker — regressed to prompt-echoing, hallucinated column names (price_delivered, customer_id_delivered), no real Go code
SEC (revenue, risk) Reads plausible — continues SEC text with coherent financial tone, generates Item 1B header
SEC (IFRS/GAAP) Falls apart — devolves into number repetition (15,200, 172, 200) and word salad
General (water cycle) Good continuation — correctly adds “Water cycle is always the most common” with coherent reasoning about evaporation states
General (compound interest) Weak — does $10,000 / 30 = 75% nonsense arithmetic
General (autonomic, photosynthesis) Hallucinates — “photometry phototechnique”, “pitty” as a branch of the nervous system

Conclusion: At 1.42 val_bpb, the model has learned code syntax structure (indentation, function definitions, React hooks) and SEC document tone, but lacks factual knowledge and arithmetic reasoning. This is typical for a 286M param model — memorization capacity is limited. The code continuation tasks work best because they’re pattern-matching on syntax rather than requiring factual recall.


Back Donate