initial: ML debugging folklore skill

Deep research to uplift LLMs for ML debugging, opinionated by source selection. Distilled from Schulman, Jones, Rahtz, Goodfellow, CS231n, FSDL, and more. Includes runnable diagnostic scripts and LLM-specific anti-patterns. Author: wassname (https://github.com/wassname)
2026-06-27 19:47:55 +08:00 · 2026-03-06 10:11:30 +08:00
commit 4393cceefd
25 changed files with 12512 additions and 0 deletions
@@ -0,0 +1,76 @@
+# ML Debugging Folklore - Vargdown Process Log
+
+## Process
+- [x] evidence files read (21 files, 9416 lines total)
+- [x] quotes extracted via 12 parallel subagents
+- [x] key quotes verified against evidence files (spot-checked ~15 quotes)
+- [x] argdown verifier passes clean (`npx @argdown/cli json` -- 14 arguments, 45 statements, 14 relations)
+- [x] subagent review done (gpt-5.2-codex via opencode; fixed non-verbatim quotes, credence calibration, PCS structure)
+- [ ] human review done
+
+## Evidence Fetch Log
+
+All evidence files were pre-existing in `docs/evidence/`. They were fetched
+in a prior session via the methods listed in each file's header.
+
+| Source | Evidence File | Fetch Method | Status |
+|--------|--------|--------|--------|
+| Schulman 2016 slides | joschu_nuts_and_bolts.md | `uvx markitdown[pdf]` | verbatim (PDF artifacts: cid markers) |
+| Schulman 2017 bootcamp | schulman_nuts_bolts_deeprl_bootcamp_2017_subtitles.md | YouTube auto-subtitles | verbatim (transcription errors: "insanity" = "and standard") |
+| Andy Jones RL debugging | andyljones_rl_debugging.md | markitdown | verbatim |
+| Henderson et al. 2018 | henderson_2018_deep_rl_matters.md | markitdown | verbatim |
+| Goodfellow Ch11 | goodfellow_ch11_practical_methodology.md | markitdown | verbatim |
+| CS231n NN3 | cs231n_neural_networks_3.md | markitdown | verbatim |
+| FSDL Spring 2021 L7 | fsdl_spring2021_lecture7.md | markitdown | verbatim |
+| Irpan RL hard | alexirpan_rl_hard.md | markitdown | verbatim |
+| amid.fish reproducing | amid_fish_reproducing_deep_rl.md | markitdown | verbatim |
+| Slavv 37 reasons | slavv_37_reasons_nn.md | markitdown | verbatim |
+| CS229 ML advice | cs229_ml_advice.md | markitdown | verbatim |
+| McCandlish 2018 | mccandlish_2018_large_batch.md | markitdown | verbatim |
+| William Falcon notes | williamfalcon_deeprl_hacks.md | markitdown | verbatim |
+| Goodfellow Ch15 | goodfellow_ch15_representation_learning.md | markitdown | verbatim |
+| Deep Learning Book | deeplearning_book.md | markitdown | verbatim |
+| Reddit RL tips 7s8px9 | reddit_rl_practical_tips_7s8px9.md | markitdown | verbatim |
+| Reddit RL debug 9sh77q | reddit_rl_debugging_tips_9sh77q.md | markitdown | verbatim |
+| Reddit RL roadblocks | reddit_rl_roadblocks_bzg3l2.md | markitdown | verbatim |
+| Reddit Schulman 5hereu | reddit_schulman_nuts_bolts_5hereu.md | markitdown | verbatim |
+| Reddit ICML tutorial | reddit_icml2017_tutorial_levine_6vcvu1.md | markitdown | verbatim |
+| Reddit DRL bootcamp | reddit_deeprl_bootcamp_2017_75m5vd.md | markitdown | verbatim |
+
+## Quote Verification Notes
+
+- Schulman subtitles contain auto-generated transcription errors (e.g., "mean insanity deviation" should be "mean and standard deviation"). Quotes used verbatim from file; errors are in the source, not introduced by us.
+- Schulman PDF (joschu_nuts_and_bolts.md) has markitdown conversion artifacts (`(cid:73)` bullet markers, table formatting). Core text is present but formatting is messy.
+- All other evidence files appear to be clean markitdown conversions.
+- 15 key quotes were manually spot-checked against evidence files. All matched.
+- Quotes from subagent extractions were cross-referenced with direct file reads.
+
+## Blockers / Caveats
+
+- Argdown verifier passes clean: `npx @argdown/cli json` exports 14 arguments, 45 statements, 14 relations. Fixed: 44 blank lines inside PCS blocks, bracket escaping in FSDL quote.
+- Some evidence files (especially Schulman PDF) have conversion artifacts that may cause verifier failures on exact quote matching.
+- The argdown uses auto-generated YouTube subtitles as a source; these contain transcription errors that are present in the evidence file.
+
+## Coverage Summary
+
+| SKILL.md Claim | Sources Used | Independent Sources |
+|---|---|---|
+| Normalize inputs mean=0 std=1 | Schulman, FSDL, Slavv | 3 |
+| Overfit tiny dataset first | CS231n, FSDL, Goodfellow | 3 |
+| Assume you have a bug | Jones, Goodfellow | 2 |
+| Seed variance is extreme | Schulman, Henderson, Irpan | 3 |
+| Use bigger batch sizes | Schulman (x2), McCandlish | 2 (Schulman slides + talk counted as 1) |
+| Hand-scale rewards, don't shift mean | Schulman, Jones, Henderson | 3 |
+| Use reference implementations | Jones, Rahtz | 2 |
+| Pursue anomalies | Jones, Rahtz | 2 |
+| Log everything | Rahtz, Goodfellow | 2 |
+| Random HP search | CS231n/Bergstra, Schulman | 2 |
+
+| Probe environments for RL | Jones | 1 (but applies general isolation principle) |
+| Policy entropy / KL diagnostics | Schulman | 1 (but built into major frameworks) |
+
+## Claims NOT Covered in Argdown (lower priority or single-source)
+- Gradient clipping masks problems (CS231n mentions, but as a technique not a warning)
+- Final layer zero init for policy (Schulman only)
+- Loss surface analysis / gradient quiver plots (original to SKILL, no external source)
+- Sweep methodology with within-group z-scores (original to SKILL)