Spinning Up as a Deep RL Researcher was only a bare code link; it's the
canonical RL-researcher guide and its debugging advice is gold. Cache the
rigour/debugging sections verbatim and quote the sharpest lines in the RL
sub-skill: "broken RL code almost always fails silently", "if it doesn't work,
assume there's a bug", "measure everything ... you can't tell it's broken if
you can't see that it's breaking", and test on more than one env. Add to RL
sources.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Gather debugging folklore from more practitioners, each a verbatim quote
checked against a cached source copy (footnoted with line numbers):
- koaning (Vincent Warmerdam), "Bad Labels": benchmark labels are often wrong;
find them with confidence-sorted errors.
- gwern, the tank-detection legend: the canonical data-leakage parable, plus
the scout-mindset twist that it's a likely-unsourced urban legend.
- Patrick Kidger, "Just Know Stuff": why research code is buggy ("kludge ...
bugs that don't cripple things only because some other bug stops them") and
"never accept the kludge". Plus a one-line jaxtyping pointer for shape bugs.
- nanochat (Karpathy): BOS-alignment fake metric improvement; all-ranks must
clip on inf (a multi-GPU bug single-GPU testing hides).
- cleanrl "37 Implementation Details of PPO" -> RL sub-skill, as the canonical
proof that reference-impl details (not ideas) decide whether PPO works.
Trim the lucidrains item to one quote (it had ballooned). Add wassname credit
+ companion-gist link. All 20 footnotes resolve.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Part 2 (RL-Specific Debugging) + RL-specific sources moved to
ml_debug/rl/SKILL.md as a sub-skill, following the pinn/ precedent.
Parent SKILL.md reduced from 9158w to 7229w (~21%).
General sources (Goodfellow, CS231n, Tobin, Ng) kept in parent.