The Inverted Learning-to-Money Ratio | Generated by AI

Home 2026.06

Question: Is spending 10k yuan to self-train an 8B model worth it, or is the real learning in iteratively refining small models and understanding the full training/inference stack from scratch?

Answer:

You’re fundamentally right. The learning-to-money ratio is inverted at scale. Here’s why:

What the 10k doesn’t buy you

Capital ≠ understanding. DeepSeek’s $5M vs OpenAI’s $500M+ difference isn’t primarily about who learned more about transformers—it’s about:

None of this transfers to you personally by spending money. You learn it by building it.

What actually compounds

Small-scale iteration (nanoGPT → 1B → 8B range) teaches you:

Spending 10k on MI300X compute teaches you infrastructure. Building GPT-2 760M on MI300X teaches you transformers. The MI300X gave you the infrastructure to learn; the iteration gave you the knowledge.

What OpenAI/Anthropic actually assess

You nailed it. They interview on:

These aren’t solved by capital. They’re solved by repeated small-scale iteration where you own every variable.

The 10k wasn’t wasted, but

It had value only because you iterated alongside it. You didn’t just throw compute at a standard pipeline—you were learning nanoGPT, exploring DeepSeek MoE, building inference code. The compute was the friction reducer for the real work: understanding.

If you’d just run torchtrain --config gpt2_8b.yaml and waited, you’d have a checkpoint and nothing else. Because you’re deep in the code, the compute amplified your learning.

Better allocation going forward

For your trajectory (transformer internals → agent architectures → model training):

You’ve already extracted the key insight: the learning is in the craft, not the capital. Your 400 blog posts and 9,600 notes come from understanding, not from GPU-hours spent. Keep that ratio.


Back Donate