Files
ml-debug/docs/ml_debug_folklore_log.md
T
wassname 4393cceefd initial: ML debugging folklore skill
Deep research to uplift LLMs for ML debugging, opinionated by source
selection. Distilled from Schulman, Jones, Rahtz, Goodfellow, CS231n,
FSDL, and more. Includes runnable diagnostic scripts and LLM-specific
anti-patterns.

Author: wassname (https://github.com/wassname)
2026-03-06 10:11:30 +08:00

4.8 KiB

ML Debugging Folklore - Vargdown Process Log

Process

  • evidence files read (21 files, 9416 lines total)
  • quotes extracted via 12 parallel subagents
  • key quotes verified against evidence files (spot-checked ~15 quotes)
  • argdown verifier passes clean (npx @argdown/cli json -- 14 arguments, 45 statements, 14 relations)
  • subagent review done (gpt-5.2-codex via opencode; fixed non-verbatim quotes, credence calibration, PCS structure)
  • human review done

Evidence Fetch Log

All evidence files were pre-existing in docs/evidence/. They were fetched in a prior session via the methods listed in each file's header.

Source Evidence File Fetch Method Status
Schulman 2016 slides joschu_nuts_and_bolts.md uvx markitdown[pdf] verbatim (PDF artifacts: cid markers)
Schulman 2017 bootcamp schulman_nuts_bolts_deeprl_bootcamp_2017_subtitles.md YouTube auto-subtitles verbatim (transcription errors: "insanity" = "and standard")
Andy Jones RL debugging andyljones_rl_debugging.md markitdown verbatim
Henderson et al. 2018 henderson_2018_deep_rl_matters.md markitdown verbatim
Goodfellow Ch11 goodfellow_ch11_practical_methodology.md markitdown verbatim
CS231n NN3 cs231n_neural_networks_3.md markitdown verbatim
FSDL Spring 2021 L7 fsdl_spring2021_lecture7.md markitdown verbatim
Irpan RL hard alexirpan_rl_hard.md markitdown verbatim
amid.fish reproducing amid_fish_reproducing_deep_rl.md markitdown verbatim
Slavv 37 reasons slavv_37_reasons_nn.md markitdown verbatim
CS229 ML advice cs229_ml_advice.md markitdown verbatim
McCandlish 2018 mccandlish_2018_large_batch.md markitdown verbatim
William Falcon notes williamfalcon_deeprl_hacks.md markitdown verbatim
Goodfellow Ch15 goodfellow_ch15_representation_learning.md markitdown verbatim
Deep Learning Book deeplearning_book.md markitdown verbatim
Reddit RL tips 7s8px9 reddit_rl_practical_tips_7s8px9.md markitdown verbatim
Reddit RL debug 9sh77q reddit_rl_debugging_tips_9sh77q.md markitdown verbatim
Reddit RL roadblocks reddit_rl_roadblocks_bzg3l2.md markitdown verbatim
Reddit Schulman 5hereu reddit_schulman_nuts_bolts_5hereu.md markitdown verbatim
Reddit ICML tutorial reddit_icml2017_tutorial_levine_6vcvu1.md markitdown verbatim
Reddit DRL bootcamp reddit_deeprl_bootcamp_2017_75m5vd.md markitdown verbatim

Quote Verification Notes

  • Schulman subtitles contain auto-generated transcription errors (e.g., "mean insanity deviation" should be "mean and standard deviation"). Quotes used verbatim from file; errors are in the source, not introduced by us.
  • Schulman PDF (joschu_nuts_and_bolts.md) has markitdown conversion artifacts ((cid:73) bullet markers, table formatting). Core text is present but formatting is messy.
  • All other evidence files appear to be clean markitdown conversions.
  • 15 key quotes were manually spot-checked against evidence files. All matched.
  • Quotes from subagent extractions were cross-referenced with direct file reads.

Blockers / Caveats

  • Argdown verifier passes clean: npx @argdown/cli json exports 14 arguments, 45 statements, 14 relations. Fixed: 44 blank lines inside PCS blocks, bracket escaping in FSDL quote.
  • Some evidence files (especially Schulman PDF) have conversion artifacts that may cause verifier failures on exact quote matching.
  • The argdown uses auto-generated YouTube subtitles as a source; these contain transcription errors that are present in the evidence file.

Coverage Summary

SKILL.md Claim Sources Used Independent Sources
Normalize inputs mean=0 std=1 Schulman, FSDL, Slavv 3
Overfit tiny dataset first CS231n, FSDL, Goodfellow 3
Assume you have a bug Jones, Goodfellow 2
Seed variance is extreme Schulman, Henderson, Irpan 3
Use bigger batch sizes Schulman (x2), McCandlish 2 (Schulman slides + talk counted as 1)
Hand-scale rewards, don't shift mean Schulman, Jones, Henderson 3
Use reference implementations Jones, Rahtz 2
Pursue anomalies Jones, Rahtz 2
Log everything Rahtz, Goodfellow 2
Random HP search CS231n/Bergstra, Schulman 2

| Probe environments for RL | Jones | 1 (but applies general isolation principle) | | Policy entropy / KL diagnostics | Schulman | 1 (but built into major frameworks) |

Claims NOT Covered in Argdown (lower priority or single-source)

  • Gradient clipping masks problems (CS231n mentions, but as a technique not a warning)
  • Final layer zero init for policy (Schulman only)
  • Loss surface analysis / gradient quiver plots (original to SKILL, no external source)
  • Sweep methodology with within-group z-scores (original to SKILL)