Power-Law Loss Curve Forecasting | Generated by AI

Home 2026.05

Fitting a power law to the val loss curve so far. Using the 11 data points (iter 1000→6000), a L(n) = L_∞ + A · n^(-α) fit gives consistent results across two reasonable choices of L_∞:

Fit A: L(n) = 2.7 + 42.4 / n^0.5     (residuals ~0.05)
Fit B: L(n) = 2.5 + 24.4 / n^0.4     (residuals ~0.05)

Both agree to ~0.02 across the projected range, so the forecast is robust to the choice. Predictions:

iter       Fit A    Fit B    Forecast    Δ from iter 6000 (3.247)
─────────────────────────────────────────────────────────────────
 7,500     3.19     3.20     ~3.19       -0.06
10,000     3.12     3.11     ~3.11       -0.14
12,500     3.08     3.05     ~3.06       -0.19
15,000     3.05     3.01     ~3.03       -0.22
19,073     3.01     2.96     ~2.98       -0.27

Sanity check against the most recent eval-to-eval deltas (4500→6000 averaged ~0.025/500 iters): integrating the implied 1/√n deceleration from iter 6000 forward gives ~0.27 total drop by 19,073 — matches both fits.

Takeaways:

Caveats:


Back Donate