- SKILL.md: 3 new entries (exploration-over-exploitation + nuisance HPs,
test-set contamination, loss-spikes-mean-bad-data-pocket) and an Ng
100-misclassified-examples quote under inspect-the-data
- refs/llm_judges.md: position/verbosity/self-preference biases (Zheng,
Wang 66/80 flip, Panickssery) + mitigation checklist from verdict docs
- Lones pitfalls linked as the exhaustive 36-item do/don't checklist
- 6 new frozen evidence files; Hamel evals link in further reading
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
NaN poisoning: inject NaN where info must not come from (future/test/labels), run the real pipeline, assert past outputs stay finite. Documents false negatives (pandas skipna, nanmean) and false positives (softmax rows, batch stats). Backprop-to-input is its gradient dual for inside the model; quote already frozen in docs/evidence/karpathy_recipe_training_nn_2019.md.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Reorder around what's durable, per wassname's curation:
- human-written intro up top; rename to "wassname's ML Debugging Folklore"
- mindset first: calibrate -> mental models -> Part 1 general tricks (kept,
they're well-based) -> read a working implementation when stuck
- a Folklore section built from verbatim, source-checked quotes (Jones,
Rahtz, Karpathy, Schulman, Henderson, Irpan, CS231n, Slavv, Goodfellow),
each footnoted to the canonical URL + the cached copy with line numbers
- LLM-agent babysitting (debugging loop, triage menu, anti-patterns) moved to
the bottom where it belongs; triage reframed as a menu, not a flowchart
- deeper one-off tricks split to refs/ (loss_surface, metric_stuck, sweeps),
scrubbed of private tooling (wandb/just/SI/personal scripts)
Quote integrity: every quote independently verified by fresh-eyes subagents
against the cached sources; fixed a reformatted Schulman slide, a truncated
Jones sentence, a reversed-order Rahtz stitch, a falsely-quoted Slavv phrase,
and the 3e-4 line (now the real tweet, framed as the joke Karpathy confirmed
it was, not gospel). lr_scheduler anti-pattern nuanced (warmup/cyclic matter).
Remove superseded SKILL2.md draft.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Add 3 new evidence files from modern open-source sources:
- karpathy_recipe_training_nn_2019.md: Karpathy's training recipe blog post
- nanochat_deepwiki_llm_pretraining_2026.md: 320+ HP sweeps for GPT-2-scale pretraining
- sanh_simple_considerations_hf_2021.md: HuggingFace NLP debugging notes
Add update-to-data ratio diagnostic to refs/diagnostics.md (target ~1e-3).
Add LLM pretraining gap note to SKILL.md intro linking the new sources.
Add tanh saturation % to logging checklist.
- Fix stale Part 2 cross-references to link to rl/SKILL.md
- Add McCandlish + Slavv back to parent Sources (cited in Part 7)
- Add back-links from refs/ files to parent SKILL.md
Moved 6.1 (static analysis grep patterns) and 6.2 (diagnostic code
snippets) to refs/static_analysis.md and refs/diagnostics.md.
Triage tree (6.3) stays in main with references to the ref files.
ml_debug/SKILL.md reduced from 7229w to 5093w (~30% from original).