feat(ml_debug): expand nanochat evidence, add lec4 diagnostics file

nanochat_deepwiki_llm_pretraining_2026.md rewritten with content from dev/LOG.md and deepwiki sections 3/12/13: - 14 labelled findings with direct quotes and empirical numbers - Dataset >> architecture (27% gain, 5 failed attempts before ClimbMix) - Scale-dependent HP sensitivity (d12 HPs hurt d20) - Multi-axis validation (steps/wall-clock/FLOPs) - Negative results: MoE/SwiGLU/MTP all failed at this scale - MFU monitoring, batch size Bopt∝D^0.383, WD∝1/width² tables - FP8 reality: 1.38x micro → 1.17x full → 5% capability-matched - Python GC 500ms overhead, torch.compile recompile gotcha karpathy_nn_zero_to_hero_lec4_diagnostics.md: new evidence file - Activation saturation check (tanh >0.97) - Gradient distribution check per-layer - Grad:data ratio (target ~1e-3) - Update-to-data ratio tracker with full plotting code - Incremental improvement log from notebook
2026-06-27 01:00:14 +08:00 · 2026-03-10 05:38:33 +08:00
parent ced4edc200
commit c9c53f8e7f
3 changed files with 289 additions and 48 deletions
@@ -159,7 +159,7 @@ for conf, pred, true, idx in errors[:10]:
 # Inspect the actual inputs for these indices. Pattern = systematic bug.
 ```

-**Update-to-data ratio check** [Karpathy nn-zero-to-hero Lec 4]
+**Update-to-data ratio check** [Karpathy nn-zero-to-hero Lec 4; evidence: karpathy_nn_zero_to_hero_lec4_diagnostics.md]
 ```python
 # Track during training: how large are updates relative to parameter magnitudes?
 # Target: ~1e-3 (log10 ~ -3). Much higher = LR too large. Much lower = LR too small.