ml-debug

mirror of https://github.com/wassname/ml-debug.git synced 2026-06-27 18:43:24 +08:00

Author	SHA1	Message	Date
wassname	a602ea5a0e	rl: quote Spinning Up (Achiam) on silent failure and bug-first debugging Spinning Up as a Deep RL Researcher was only a bare code link; it's the canonical RL-researcher guide and its debugging advice is gold. Cache the rigour/debugging sections verbatim and quote the sharpest lines in the RL sub-skill: "broken RL code almost always fails silently", "if it doesn't work, assume there's a bug", "measure everything ... you can't tell it's broken if you can't see that it's breaking", and test on more than one env. Add to RL sources. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-02 21:04:55 +08:00
wassname	ee4e9a5caa	folklore: add koaning, gwern, kidger, nanochat, cleanrl; trim lucidrains Gather debugging folklore from more practitioners, each a verbatim quote checked against a cached source copy (footnoted with line numbers): - koaning (Vincent Warmerdam), "Bad Labels": benchmark labels are often wrong; find them with confidence-sorted errors. - gwern, the tank-detection legend: the canonical data-leakage parable, plus the scout-mindset twist that it's a likely-unsourced urban legend. - Patrick Kidger, "Just Know Stuff": why research code is buggy ("kludge ... bugs that don't cripple things only because some other bug stops them") and "never accept the kludge". Plus a one-line jaxtyping pointer for shape bugs. - nanochat (Karpathy): BOS-alignment fake metric improvement; all-ranks must clip on inf (a multi-GPU bug single-GPU testing hides). - cleanrl "37 Implementation Details of PPO" -> RL sub-skill, as the canonical proof that reference-impl details (not ideas) decide whether PPO works. Trim the lucidrains item to one quote (it had ballooned). Add wassname credit + companion-gist link. All 20 footnotes resolve. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-02 20:59:36 +08:00
wassname	fde5ac62fd	name	2026-04-09 05:09:25 +08:00
wassname	7f34f26a5c	refactor(ml_debug): extract RL debugging into rl/ sub-skill Part 2 (RL-Specific Debugging) + RL-specific sources moved to ml_debug/rl/SKILL.md as a sub-skill, following the pinn/ precedent. Parent SKILL.md reduced from 9158w to 7229w (~21%). General sources (Goodfellow, CS231n, Tobin, Ng) kept in parent.	2026-03-06 13:36:29 +08:00

4 Commits