folklore: add koaning, gwern, kidger, nanochat, cleanrl; trim lucidrains

Gather debugging folklore from more practitioners, each a verbatim quote
checked against a cached source copy (footnoted with line numbers):
- koaning (Vincent Warmerdam), "Bad Labels": benchmark labels are often wrong;
  find them with confidence-sorted errors.
- gwern, the tank-detection legend: the canonical data-leakage parable, plus
  the scout-mindset twist that it's a likely-unsourced urban legend.
- Patrick Kidger, "Just Know Stuff": why research code is buggy ("kludge ...
  bugs that don't cripple things only because some other bug stops them") and
  "never accept the kludge". Plus a one-line jaxtyping pointer for shape bugs.
- nanochat (Karpathy): BOS-alignment fake metric improvement; all-ranks must
  clip on inf (a multi-GPU bug single-GPU testing hides).
- cleanrl "37 Implementation Details of PPO" -> RL sub-skill, as the canonical
  proof that reference-impl details (not ideas) decide whether PPO works.

Trim the lucidrains item to one quote (it had ballooned). Add wassname credit
+ companion-gist link. All 20 footnotes resolve.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-02 20:59:36 +08:00
parent 9911ac83c5
commit ee4e9a5caa
6 changed files with 134 additions and 6 deletions
+3
View File
@@ -108,6 +108,8 @@ The allure of writing from scratch is real but the self-correction mechanisms in
2. Use reference impl as source of reliable components, work to the same API
3. Have one eye on reference while you write your own -- copy hyperparameters, discounting code, termination handling
The canonical demonstration is "The 37 Implementation Details of PPO" [Huang et al. 2022]: reproducing PPO meant matching 37 separate details (13 core, 9 Atari, 9 continuous-control, 5 LSTM, 1 MultiDiscrete), each linked to its exact line of code, "which is not done in academic papers." If your PPO underperforms a reference, you are probably missing some of these, not lacking a better idea.
References: spinning-up (OpenAI), stable-baselines3, cleanrl (single-file per algo), OpenSpiel (multi-agent).
**10. Don't over-interpret noise.** [Schulman 2017, Henderson 2018, Irpan 2018]
@@ -149,6 +151,7 @@ Sometimes (rarely) you don't. Schulman:
- Alex Irpan, "Deep Reinforcement Learning Doesn't Work Yet" (2018): https://www.alexirpan.com/2018/02/14/rl-hard.html
- McCandlish & Kaplan, "An Empirical Model of Large-Batch Training" (2018): https://arxiv.org/abs/1812.06162
- Slav Ivanov, "37 Reasons why your Neural Network is not working" (2017): https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607
- Huang et al., "The 37 Implementation Details of PPO" (ICLR Blog Track 2022): https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ ([cache](../docs/evidence/cleanrl_37_ppo_details.md))
### Reference implementations
- OpenAI Spinning Up: https://github.com/openai/spinningup