Scaling LLM Usage to Billions of Tokens | Generated by AI

Home 2026.05

Question: How can someone consume around 10 billion tokens in a year using LLMs, and who are known examples of people doing this (like Huang Dongxu from TiDB)?

Answer:

If you actually want to reach billions of tokens per year, casual chatting won’t get you there. You need to treat LLM usage as a systematic, high-throughput workflow, not just occasional prompting.

Let’s break it down realistically.

1. What “10 billion tokens/year” actually means

10B tokens/year ≈

~27 million tokens/day
~1.1 million tokens/hour (24h continuous)

That’s not human-scale interaction anymore. That’s pipeline + automation scale.

So if you’re imagining “just prompting more,” that won’t work. You need machines prompting machines.

2. The only 3 ways people reach that scale

(A) Programmatic usage (most important)

Instead of:

You → prompt → read → prompt

You do:

Code → generates prompts → sends to LLM → processes outputs → loops

Typical patterns:

Batch processing datasets
Auto-evaluation loops
Synthetic data generation
Multi-agent simulations

Example:

for item in dataset:
    response = llm(prompt_template(item))
    store(response)

This alone can generate millions of tokens per hour.

(B) Iterative pipelines (LLM calling LLM)

Heavy users don’t just call once—they chain:

Draft → critique → improve → summarize → evaluate
Or:
- Generator model
- Critic model
- Refiner model

This multiplies token usage by 5–20x per task.

(C) Long-context workflows

Using:

32k / 128k / 1M context windows

Activities:

Feed entire books/codebases
Repeated re-analysis
Memory-based agents

One run can consume:

hundreds of thousands to millions of tokens

3. Practical ways YOU can reach high token usage

If your goal is intentional (learning / building), here are concrete strategies:

1. Build a “thinking loop system”

Instead of asking once:

Ask → get answer
Ask model to critique
Ask model to improve
Ask model to compress
Repeat

This turns:

1 prompt → 10–50 prompts

2. Use LLM for everything you read

You mentioned:

prompting + reading + iterating

Upgrade it:

Paste articles/books/code
Ask:
- summarize
- extract structure
- generate questions
- debate both sides

You multiply token usage by:

input tokens (large)
output tokens (large)

3. Synthetic data generation

This is what serious builders do:

Generate:
- Q&A pairs
- training data
- edge cases
Then:
- filter
- score
- regenerate

This is exactly how people burn billions of tokens.

4. Continuous background jobs

Run scripts like:

“Generate 10k problems per day”
“Simulate 1k conversations”
“Rewrite dataset 5 times”

This is how you cross from:

millions → billions

5. Multi-agent systems

Create agents that:

Debate
Plan
Execute
Reflect

Each step = multiple LLM calls.

This explodes token usage naturally.

4. People known for extreme LLM usage

Huang Dongxu

Known in engineering circles for:
- Deep technical exploration
- Heavy experimentation mindset
Represents the “builder who learns by doing at scale” archetype

Andrej Karpathy

Talks about:
- “tokens as the new compute”
- treating LLMs as programmable systems
Encourages high-volume experimentation

Sam Altman

Has mentioned:
- some users consume massive token volumes
Internally, power users are:
- researchers
- startups building pipelines

Emad Mostaque

Advocates for:
- large-scale generation
- synthetic data loops

5. Reality check (important)

If you’re doing:

manual prompting + reading

You might reach:

1M–10M tokens/year (already heavy)

To hit:

1B+ tokens → requires automation
10B+ tokens → requires infrastructure

So the real shift is:

from “user” → “system builder”

6. A better mental model

Think like this:

Tokens = fuel
Prompts = programs
LLM = CPU

If you want to burn fuel: → you need programs running continuously

7. A practical path for you (based on your background)

Given you already:

do ML
build systems

You should try:

Build a dataset generator
Add:
- evaluator loop
- improvement loop
Run it daily

That alone can push you into:

100M–1B tokens/year range

If you want, I can design a personal “10B token system architecture” tailored to your skills (full-stack + ML).

Back Donate