Adaptive Exploration and Exploitation Strategy | Generated by AI
Question: What’s your take on the BFS→DFS→BFS learning strategy — learn widely first, then go deep on what’s interesting, then resurface to check if the depth is paying off? And how does this apply to someone like me?
Answer:
This is essentially exploration-exploitation with adaptive switching — and it’s one of the most defensible learning strategies for someone at your stage.
The Algorithm, Stated Precisely
while alive:
# BFS phase — weeks to months
explore(domains=[AI, LLM, iOS, robotics, hardware, Japanese, ...])
interesting = filter(domains, predicate=resonates_with_me)
# DFS phase — 3-6 months minimum
for domain in interesting:
go_deep(domain, depth=until_producing_value)
# BFS again — reality check
update(world_model)
on_track = evaluate(life_improved, skills_compounding, market_signal)
if on_track:
double_down(direction)
else:
explore_adjacent()
The key insight you’re articulating: neither pure BFS nor pure DFS alone is optimal. Pure BFS = dilettante. Pure DFS without BFS = tunnel vision, wrong tunnel.
Why This Works for You Specifically
You already have an unusually wide BFS base — iOS, Android, backend, cloud, finance systems, GPT-2 training, CLI agents. Most people your age have either breadth OR depth. You have both at a surface level. So your next BFS pass isn’t starting from zero, it’s a higher-order scan — robotics, compilers, language models at the weights level, hardware (CUDA internals, silicon), bioinformatics if curious. Each new BFS scan lands you in richer territory than the last.
The Japanese analogy is sharp. If you look up every word in ChatGPT and actually read the output — that’s DFS applied at micro-scale, repeatedly. Compounding. Six months of that and you can read novels. The mechanism is the same whether it’s Japanese kanji or attention math.
The Part Most People Miss: What Makes a Good DFS Target
Not everything interesting deserves DFS. The filter should be:
def worth_dfs(domain):
return (
intrinsically_interesting(domain) AND
(market_signal(domain) > 0 OR
compounds_with_existing_skills(domain) OR
opens_new_bfs_territory(domain))
)
For you right now: LLM internals + agent architectures + CUDA clears all three. Japanese clears the first and third (opens East Asian tech/research). Pure iOS probably doesn’t clear the second anymore unless tied to AI features.
The BFS Re-emerge Phase Is Underrated
Most people skip the resurface step — they DFS forever and lose world-model calibration. Or they resurface too often and never go deep enough to produce value. Your instinct — half a year of DFS, then look up — is about right. Six months is long enough to:
- Get past the “everything is hard” phase
- Start producing something (code, writing, models, tools)
- Have enough signal to know if the market/your life actually changed
The evaluation question you’re asking is correct too: “Is my life better because of what I learned or built?” Not “do I know more facts.” Better = shipped something people use, got paid more, unlocked a job, made a sharper decision, read a paper and understood it cold.
One Addition: Track the DFS Output Publicly
You already do this — your blog, your notes repo. That’s not just discipline, it’s forcing the BFS resurface. Writing publicly forces you to explain, which forces you to notice gaps, which triggers the next BFS. Your 400 posts aren’t just output — they’re the mechanism that keeps the BFS→DFS cycle honest.
Concrete Recommendation for Your Next Cycle
Given where you are:
| Phase | Duration | Focus |
|---|---|---|
| BFS now | 2-3 weeks | Skim: robotics (ROS2), compiler internals, post-training techniques (DPO/GRPO), Japanese N4→N3 gap |
| DFS | 4-6 months | Pick 1-2: my guess — CUDA kernels + inference optimization or RL-based training (GRPO/PPO) — both compound directly on your H200/MI300X access |
| BFS resurface | 1 week | Re-read job market, your own blog, what’s shipping in AI — am I closer to “deeply technical AI engineer” than 6 months ago? |
You’re already running this algorithm intuitively. Naming it just lets you run it more deliberately.