Adaptive Exploration and Exploitation Strategy | Generated by AI

Home 2026.05

Question: What’s your take on the BFS→DFS→BFS learning strategy — learn widely first, then go deep on what’s interesting, then resurface to check if the depth is paying off? And how does this apply to someone like me?

Answer:

This is essentially exploration-exploitation with adaptive switching — and it’s one of the most defensible learning strategies for someone at your stage.

The Algorithm, Stated Precisely

while alive:
    # BFS phase — weeks to months
    explore(domains=[AI, LLM, iOS, robotics, hardware, Japanese, ...])
    interesting = filter(domains, predicate=resonates_with_me)
    
    # DFS phase — 3-6 months minimum
    for domain in interesting:
        go_deep(domain, depth=until_producing_value)
    
    # BFS again — reality check
    update(world_model)
    on_track = evaluate(life_improved, skills_compounding, market_signal)
    if on_track:
        double_down(direction)
    else:
        explore_adjacent()

The key insight you’re articulating: neither pure BFS nor pure DFS alone is optimal. Pure BFS = dilettante. Pure DFS without BFS = tunnel vision, wrong tunnel.

Why This Works for You Specifically

You already have an unusually wide BFS base — iOS, Android, backend, cloud, finance systems, GPT-2 training, CLI agents. Most people your age have either breadth OR depth. You have both at a surface level. So your next BFS pass isn’t starting from zero, it’s a higher-order scan — robotics, compilers, language models at the weights level, hardware (CUDA internals, silicon), bioinformatics if curious. Each new BFS scan lands you in richer territory than the last.

The Japanese analogy is sharp. If you look up every word in ChatGPT and actually read the output — that’s DFS applied at micro-scale, repeatedly. Compounding. Six months of that and you can read novels. The mechanism is the same whether it’s Japanese kanji or attention math.

The Part Most People Miss: What Makes a Good DFS Target

Not everything interesting deserves DFS. The filter should be:

def worth_dfs(domain):
    return (
        intrinsically_interesting(domain) AND
        (market_signal(domain) > 0 OR
         compounds_with_existing_skills(domain) OR
         opens_new_bfs_territory(domain))
    )

For you right now: LLM internals + agent architectures + CUDA clears all three. Japanese clears the first and third (opens East Asian tech/research). Pure iOS probably doesn’t clear the second anymore unless tied to AI features.

The BFS Re-emerge Phase Is Underrated

Most people skip the resurface step — they DFS forever and lose world-model calibration. Or they resurface too often and never go deep enough to produce value. Your instinct — half a year of DFS, then look up — is about right. Six months is long enough to:

The evaluation question you’re asking is correct too: “Is my life better because of what I learned or built?” Not “do I know more facts.” Better = shipped something people use, got paid more, unlocked a job, made a sharper decision, read a paper and understood it cold.

One Addition: Track the DFS Output Publicly

You already do this — your blog, your notes repo. That’s not just discipline, it’s forcing the BFS resurface. Writing publicly forces you to explain, which forces you to notice gaps, which triggers the next BFS. Your 400 posts aren’t just output — they’re the mechanism that keeps the BFS→DFS cycle honest.

Concrete Recommendation for Your Next Cycle

Given where you are:

Phase Duration Focus
BFS now 2-3 weeks Skim: robotics (ROS2), compiler internals, post-training techniques (DPO/GRPO), Japanese N4→N3 gap
DFS 4-6 months Pick 1-2: my guess — CUDA kernels + inference optimization or RL-based training (GRPO/PPO) — both compound directly on your H200/MI300X access
BFS resurface 1 week Re-read job market, your own blog, what’s shipping in AI — am I closer to “deeply technical AI engineer” than 6 months ago?

You’re already running this algorithm intuitively. Naming it just lets you run it more deliberately.


Back Donate