# ML Debugging Folklore - Vargdown Process Log ## Process - [x] evidence files read (21 files, 9416 lines total) - [x] quotes extracted via 12 parallel subagents - [x] key quotes verified against evidence files (spot-checked ~15 quotes) - [x] argdown verifier passes clean (`npx @argdown/cli json` -- 14 arguments, 45 statements, 14 relations) - [x] subagent review done (gpt-5.2-codex via opencode; fixed non-verbatim quotes, credence calibration, PCS structure) - [ ] human review done ## Evidence Fetch Log All evidence files were pre-existing in `docs/evidence/`. They were fetched in a prior session via the methods listed in each file's header. | Source | Evidence File | Fetch Method | Status | |--------|--------|--------|--------| | Schulman 2016 slides | joschu_nuts_and_bolts.md | `uvx markitdown[pdf]` | verbatim (PDF artifacts: cid markers) | | Schulman 2017 bootcamp | schulman_nuts_bolts_deeprl_bootcamp_2017_subtitles.md | YouTube auto-subtitles | verbatim (transcription errors: "insanity" = "and standard") | | Andy Jones RL debugging | andyljones_rl_debugging.md | markitdown | verbatim | | Henderson et al. 2018 | henderson_2018_deep_rl_matters.md | markitdown | verbatim | | Goodfellow Ch11 | goodfellow_ch11_practical_methodology.md | markitdown | verbatim | | CS231n NN3 | cs231n_neural_networks_3.md | markitdown | verbatim | | FSDL Spring 2021 L7 | fsdl_spring2021_lecture7.md | markitdown | verbatim | | Irpan RL hard | alexirpan_rl_hard.md | markitdown | verbatim | | amid.fish reproducing | amid_fish_reproducing_deep_rl.md | markitdown | verbatim | | Slavv 37 reasons | slavv_37_reasons_nn.md | markitdown | verbatim | | CS229 ML advice | cs229_ml_advice.md | markitdown | verbatim | | McCandlish 2018 | mccandlish_2018_large_batch.md | markitdown | verbatim | | William Falcon notes | williamfalcon_deeprl_hacks.md | markitdown | verbatim | | Goodfellow Ch15 | goodfellow_ch15_representation_learning.md | markitdown | verbatim | | Deep Learning Book | deeplearning_book.md | markitdown | verbatim | | Reddit RL tips 7s8px9 | reddit_rl_practical_tips_7s8px9.md | markitdown | verbatim | | Reddit RL debug 9sh77q | reddit_rl_debugging_tips_9sh77q.md | markitdown | verbatim | | Reddit RL roadblocks | reddit_rl_roadblocks_bzg3l2.md | markitdown | verbatim | | Reddit Schulman 5hereu | reddit_schulman_nuts_bolts_5hereu.md | markitdown | verbatim | | Reddit ICML tutorial | reddit_icml2017_tutorial_levine_6vcvu1.md | markitdown | verbatim | | Reddit DRL bootcamp | reddit_deeprl_bootcamp_2017_75m5vd.md | markitdown | verbatim | ## Quote Verification Notes - Schulman subtitles contain auto-generated transcription errors (e.g., "mean insanity deviation" should be "mean and standard deviation"). Quotes used verbatim from file; errors are in the source, not introduced by us. - Schulman PDF (joschu_nuts_and_bolts.md) has markitdown conversion artifacts (`(cid:73)` bullet markers, table formatting). Core text is present but formatting is messy. - All other evidence files appear to be clean markitdown conversions. - 15 key quotes were manually spot-checked against evidence files. All matched. - Quotes from subagent extractions were cross-referenced with direct file reads. ## Blockers / Caveats - Argdown verifier passes clean: `npx @argdown/cli json` exports 14 arguments, 45 statements, 14 relations. Fixed: 44 blank lines inside PCS blocks, bracket escaping in FSDL quote. - Some evidence files (especially Schulman PDF) have conversion artifacts that may cause verifier failures on exact quote matching. - The argdown uses auto-generated YouTube subtitles as a source; these contain transcription errors that are present in the evidence file. ## Coverage Summary | SKILL.md Claim | Sources Used | Independent Sources | |---|---|---| | Normalize inputs mean=0 std=1 | Schulman, FSDL, Slavv | 3 | | Overfit tiny dataset first | CS231n, FSDL, Goodfellow | 3 | | Assume you have a bug | Jones, Goodfellow | 2 | | Seed variance is extreme | Schulman, Henderson, Irpan | 3 | | Use bigger batch sizes | Schulman (x2), McCandlish | 2 (Schulman slides + talk counted as 1) | | Hand-scale rewards, don't shift mean | Schulman, Jones, Henderson | 3 | | Use reference implementations | Jones, Rahtz | 2 | | Pursue anomalies | Jones, Rahtz | 2 | | Log everything | Rahtz, Goodfellow | 2 | | Random HP search | CS231n/Bergstra, Schulman | 2 | | Probe environments for RL | Jones | 1 (but applies general isolation principle) | | Policy entropy / KL diagnostics | Schulman | 1 (but built into major frameworks) | ## Claims NOT Covered in Argdown (lower priority or single-source) - Gradient clipping masks problems (CS231n mentions, but as a technique not a warning) - Final layer zero init for policy (Schulman only) - Loss surface analysis / gradient quiver plots (original to SKILL, no external source) - Sweep methodology with within-group z-scores (original to SKILL)