Curated AI Resources for Engineers | Generated by AI
Let me find what’s currently active and worth your time.The search surfaced mostly hardware vendor SEO. Those aren’t worth your time. Here’s a curated list of sites that actually compound for someone training models and building agents.
Question: What are hot/interesting AI and GPU websites worth following for a deep-technical, AI-first engineer?
Answer:
Training internals & deep technical writing
- Karpathy’s blog + nanochat/nanoGPT repos — you’re already here, but his GitHub Discussions and issues are where the real teaching happens. Watch the nanochat repo activity.
- Lil’Log (Lilian Weng) — the canonical deep-dives on attention, diffusion, RLHF, reward hacking. Math-heavy, exactly your register.
- Jay Alammar — illustrated transformer/LLM internals. Visual but not shallow.
- Sebastian Raschka’s Ahead of AI — best signal-to-noise newsletter on LLM training, fine-tuning, and architecture papers. He ships actual PyTorch implementations (his
LLMs-from-scratchrepo pairs with the posts). - Eleuther AI blog — GPT-NeoX, scaling laws, the people who actually train open models from scratch.
GPU / CUDA / kernels
- GPU MODE (formerly CUDA MODE) — Discord + YouTube + repos. The community for kernel writing, Triton, FlashAttention internals. This is the single best place for your CUDA/inference-optimization trajectory.
- Horace He / “Making Deep Learning Go Brrrr From First Principles” — memory-bound vs compute-bound, the mental model for all GPU optimization.
- Modal blog and Together AI blog — practical inference/training infra, FlashAttention authorship lives near Together.
- vLLM blog and llm-d (surfaced above) — PagedAttention, prefix caching, distributed inference. The recent llm-d posts on precise prefix-cache aware scheduling report large throughput gains by giving the scheduler direct introspection into distributed vLLM KV cache.
MoE / frontier architecture (your DeepSeek v4 interest)
- DeepSeek’s GitHub + tech reports — read the actual papers (DeepSeek-V3, MLA, MoE load balancing). Their open infra repos (FlashMLA, DeepEP, DeepGEMM) are gold for kernel-level MoE.
- Tri Dao’s blog — FlashAttention, Mamba, state-space models from the source.
Aggregators worth the noise
- Hacker News — filter for AI/GPU submissions; comment threads often beat the articles.
- r/LocalLLaMA — the pulse for open-weight models, quantization, single-GPU experiments. Aligns with your RTX 4070 / MI300X tinkering.
- The Gradient and Sebastian Ruder’s NLP newsletter — research-adjacent depth.
For your idol-tier matching
- Yin Wang’s site — you know it; worth re-reading his compiler/PL essays for the first-principles discipline.
- Simon Willison’s blog — prolific, build-in-public, LLM tooling and CLI agents. Closest spiritually to your
ww/iclaw/zzworkflow.
If I had to pick three to start: GPU MODE (kernels/CUDA), Sebastian Raschka (training-from-scratch with code), and DeepSeek’s repos (MoE at the frontier). Those three map directly onto your MI300X training, nanochat work, and DeepSeek v4 MoE exploration.
References:
- Lil’Log
- Ahead of AI — Sebastian Raschka
- GPU MODE
- Making DL Go Brrrr — Horace He
- DeepSeek AI GitHub
- vLLM blog
- llm-d blog
- r/LocalLLaMA