Gather debugging folklore from more practitioners, each a verbatim quote
checked against a cached source copy (footnoted with line numbers):
- koaning (Vincent Warmerdam), "Bad Labels": benchmark labels are often wrong;
find them with confidence-sorted errors.
- gwern, the tank-detection legend: the canonical data-leakage parable, plus
the scout-mindset twist that it's a likely-unsourced urban legend.
- Patrick Kidger, "Just Know Stuff": why research code is buggy ("kludge ...
bugs that don't cripple things only because some other bug stops them") and
"never accept the kludge". Plus a one-line jaxtyping pointer for shape bugs.
- nanochat (Karpathy): BOS-alignment fake metric improvement; all-ranks must
clip on inf (a multi-GPU bug single-GPU testing hides).
- cleanrl "37 Implementation Details of PPO" -> RL sub-skill, as the canonical
proof that reference-impl details (not ideas) decide whether PPO works.
Trim the lucidrains item to one quote (it had ballooned). Add wassname credit
+ companion-gist link. All 20 footnotes resolve.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Phil Wang's x-transformers is the canonical "the fix is in the code, not the
paper" catalogue. Add a folklore item on the most debugging-relevant trick:
QK / cosine-sim normalization to stop attention logits overflowing (the usual
cause of transformer loss spikes/divergence), plus the BLOOM/YaLM
post-embedding LayerNorm. Two verbatim lucidrains quotes, footnoted to the repo
+ a cached README copy with line numbers. Doubles as the modern concrete
example for the read-a-working-implementation section.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Add 3 new evidence files from modern open-source sources:
- karpathy_recipe_training_nn_2019.md: Karpathy's training recipe blog post
- nanochat_deepwiki_llm_pretraining_2026.md: 320+ HP sweeps for GPT-2-scale pretraining
- sanh_simple_considerations_hf_2021.md: HuggingFace NLP debugging notes
Add update-to-data ratio diagnostic to refs/diagnostics.md (target ~1e-3).
Add LLM pretraining gap note to SKILL.md intro linking the new sources.
Add tanh saturation % to logging checklist.
Deep research to uplift LLMs for ML debugging, opinionated by source
selection. Distilled from Schulman, Jones, Rahtz, Goodfellow, CS231n,
FSDL, and more. Includes runnable diagnostic scripts and LLM-specific
anti-patterns.
Author: wassname (https://github.com/wassname)