Normal Behavior for Small AI Models

Normal Behavior for Small AI Models | Generated by AI

Home 2026.06

Yes, this is normal for a 124M parameter model. Here’s why:

Small model: 124M params is tiny by today’s standards. GPT-2 124M can produce grammatically correct SEC-style text but has very limited ability to maintain contextual coherence over long generations.
Topic drift: The prompt talks about net losses and healthcare, but the model drifts to “e-commerce” and “operating divisions” — it’s generating statistically plausible SEC filing language without really “understanding” the prompt content.
Repetition: “e-commerce and e-commerce” repeated many times is a classic degeneration pattern for small language models. They lack the capacity to track what they’ve already said.
Training data: The model was trained on all SEC EDGAR filings (not just healthcare), so it has a broad but shallow understanding of filing language.

What you’re seeing is the realistic ceiling of a 124M model. To get better output you’d need:

Larger model: 350M+ params would maintain topic coherence much better
More training data/longer training: The model was trained for 47k steps
Fine-tuning on a specific domain: e.g., only healthcare SEC filings

The model is working correctly — it’s just at its capacity limit. For a demo/proof-of-concept it’s fine; for production you’d want at least a 1-3B parameter model.

Back Donate