mirror of
https://github.com/wassname/ml-debug.git
synced 2026-06-27 18:24:28 +08:00
4393cceefd
Deep research to uplift LLMs for ML debugging, opinionated by source selection. Distilled from Schulman, Jones, Rahtz, Goodfellow, CS231n, FSDL, and more. Includes runnable diagnostic scripts and LLM-specific anti-patterns. Author: wassname (https://github.com/wassname)
4.8 KiB
4.8 KiB
ML Debugging Folklore - Vargdown Process Log
Process
- evidence files read (21 files, 9416 lines total)
- quotes extracted via 12 parallel subagents
- key quotes verified against evidence files (spot-checked ~15 quotes)
- argdown verifier passes clean (
npx @argdown/cli json-- 14 arguments, 45 statements, 14 relations) - subagent review done (gpt-5.2-codex via opencode; fixed non-verbatim quotes, credence calibration, PCS structure)
- human review done
Evidence Fetch Log
All evidence files were pre-existing in docs/evidence/. They were fetched
in a prior session via the methods listed in each file's header.
| Source | Evidence File | Fetch Method | Status |
|---|---|---|---|
| Schulman 2016 slides | joschu_nuts_and_bolts.md | uvx markitdown[pdf] |
verbatim (PDF artifacts: cid markers) |
| Schulman 2017 bootcamp | schulman_nuts_bolts_deeprl_bootcamp_2017_subtitles.md | YouTube auto-subtitles | verbatim (transcription errors: "insanity" = "and standard") |
| Andy Jones RL debugging | andyljones_rl_debugging.md | markitdown | verbatim |
| Henderson et al. 2018 | henderson_2018_deep_rl_matters.md | markitdown | verbatim |
| Goodfellow Ch11 | goodfellow_ch11_practical_methodology.md | markitdown | verbatim |
| CS231n NN3 | cs231n_neural_networks_3.md | markitdown | verbatim |
| FSDL Spring 2021 L7 | fsdl_spring2021_lecture7.md | markitdown | verbatim |
| Irpan RL hard | alexirpan_rl_hard.md | markitdown | verbatim |
| amid.fish reproducing | amid_fish_reproducing_deep_rl.md | markitdown | verbatim |
| Slavv 37 reasons | slavv_37_reasons_nn.md | markitdown | verbatim |
| CS229 ML advice | cs229_ml_advice.md | markitdown | verbatim |
| McCandlish 2018 | mccandlish_2018_large_batch.md | markitdown | verbatim |
| William Falcon notes | williamfalcon_deeprl_hacks.md | markitdown | verbatim |
| Goodfellow Ch15 | goodfellow_ch15_representation_learning.md | markitdown | verbatim |
| Deep Learning Book | deeplearning_book.md | markitdown | verbatim |
| Reddit RL tips 7s8px9 | reddit_rl_practical_tips_7s8px9.md | markitdown | verbatim |
| Reddit RL debug 9sh77q | reddit_rl_debugging_tips_9sh77q.md | markitdown | verbatim |
| Reddit RL roadblocks | reddit_rl_roadblocks_bzg3l2.md | markitdown | verbatim |
| Reddit Schulman 5hereu | reddit_schulman_nuts_bolts_5hereu.md | markitdown | verbatim |
| Reddit ICML tutorial | reddit_icml2017_tutorial_levine_6vcvu1.md | markitdown | verbatim |
| Reddit DRL bootcamp | reddit_deeprl_bootcamp_2017_75m5vd.md | markitdown | verbatim |
Quote Verification Notes
- Schulman subtitles contain auto-generated transcription errors (e.g., "mean insanity deviation" should be "mean and standard deviation"). Quotes used verbatim from file; errors are in the source, not introduced by us.
- Schulman PDF (joschu_nuts_and_bolts.md) has markitdown conversion artifacts (
(cid:73)bullet markers, table formatting). Core text is present but formatting is messy. - All other evidence files appear to be clean markitdown conversions.
- 15 key quotes were manually spot-checked against evidence files. All matched.
- Quotes from subagent extractions were cross-referenced with direct file reads.
Blockers / Caveats
- Argdown verifier passes clean:
npx @argdown/cli jsonexports 14 arguments, 45 statements, 14 relations. Fixed: 44 blank lines inside PCS blocks, bracket escaping in FSDL quote. - Some evidence files (especially Schulman PDF) have conversion artifacts that may cause verifier failures on exact quote matching.
- The argdown uses auto-generated YouTube subtitles as a source; these contain transcription errors that are present in the evidence file.
Coverage Summary
| SKILL.md Claim | Sources Used | Independent Sources |
|---|---|---|
| Normalize inputs mean=0 std=1 | Schulman, FSDL, Slavv | 3 |
| Overfit tiny dataset first | CS231n, FSDL, Goodfellow | 3 |
| Assume you have a bug | Jones, Goodfellow | 2 |
| Seed variance is extreme | Schulman, Henderson, Irpan | 3 |
| Use bigger batch sizes | Schulman (x2), McCandlish | 2 (Schulman slides + talk counted as 1) |
| Hand-scale rewards, don't shift mean | Schulman, Jones, Henderson | 3 |
| Use reference implementations | Jones, Rahtz | 2 |
| Pursue anomalies | Jones, Rahtz | 2 |
| Log everything | Rahtz, Goodfellow | 2 |
| Random HP search | CS231n/Bergstra, Schulman | 2 |
| Probe environments for RL | Jones | 1 (but applies general isolation principle) | | Policy entropy / KL diagnostics | Schulman | 1 (but built into major frameworks) |
Claims NOT Covered in Argdown (lower priority or single-source)
- Gradient clipping masks problems (CS231n mentions, but as a technique not a warning)
- Final layer zero init for policy (Schulman only)
- Loss surface analysis / gradient quiver plots (original to SKILL, no external source)
- Sweep methodology with within-group z-scores (original to SKILL)