mirror of
https://github.com/wassname/ml-debug.git
synced 2026-06-27 16:15:57 +08:00
folklore: add koaning, gwern, kidger, nanochat, cleanrl; trim lucidrains
Gather debugging folklore from more practitioners, each a verbatim quote
checked against a cached source copy (footnoted with line numbers):
- koaning (Vincent Warmerdam), "Bad Labels": benchmark labels are often wrong;
find them with confidence-sorted errors.
- gwern, the tank-detection legend: the canonical data-leakage parable, plus
the scout-mindset twist that it's a likely-unsourced urban legend.
- Patrick Kidger, "Just Know Stuff": why research code is buggy ("kludge ...
bugs that don't cripple things only because some other bug stops them") and
"never accept the kludge". Plus a one-line jaxtyping pointer for shape bugs.
- nanochat (Karpathy): BOS-alignment fake metric improvement; all-ranks must
clip on inf (a multi-GPU bug single-GPU testing hides).
- cleanrl "37 Implementation Details of PPO" -> RL sub-skill, as the canonical
proof that reference-impl details (not ideas) decide whether PPO works.
Trim the lucidrains item to one quote (it had ballooned). Add wassname credit
+ companion-gist link. All 20 footnotes resolve.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -108,6 +108,8 @@ The allure of writing from scratch is real but the self-correction mechanisms in
|
||||
2. Use reference impl as source of reliable components, work to the same API
|
||||
3. Have one eye on reference while you write your own -- copy hyperparameters, discounting code, termination handling
|
||||
|
||||
The canonical demonstration is "The 37 Implementation Details of PPO" [Huang et al. 2022]: reproducing PPO meant matching 37 separate details (13 core, 9 Atari, 9 continuous-control, 5 LSTM, 1 MultiDiscrete), each linked to its exact line of code, "which is not done in academic papers." If your PPO underperforms a reference, you are probably missing some of these, not lacking a better idea.
|
||||
|
||||
References: spinning-up (OpenAI), stable-baselines3, cleanrl (single-file per algo), OpenSpiel (multi-agent).
|
||||
|
||||
**10. Don't over-interpret noise.** [Schulman 2017, Henderson 2018, Irpan 2018]
|
||||
@@ -149,6 +151,7 @@ Sometimes (rarely) you don't. Schulman:
|
||||
- Alex Irpan, "Deep Reinforcement Learning Doesn't Work Yet" (2018): https://www.alexirpan.com/2018/02/14/rl-hard.html
|
||||
- McCandlish & Kaplan, "An Empirical Model of Large-Batch Training" (2018): https://arxiv.org/abs/1812.06162
|
||||
- Slav Ivanov, "37 Reasons why your Neural Network is not working" (2017): https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607
|
||||
- Huang et al., "The 37 Implementation Details of PPO" (ICLR Blog Track 2022): https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ ([cache](../docs/evidence/cleanrl_37_ppo_details.md))
|
||||
|
||||
### Reference implementations
|
||||
- OpenAI Spinning Up: https://github.com/openai/spinningup
|
||||
|
||||
Reference in New Issue
Block a user