wassname bb1a6bc61c feat(ml_debug): lead with judgment gates over the symptom-lookup encyclopedia
The skill was thorough but failed to instill debugging taste: an agent would
pattern-match a symptom-table row to a fix and ship a guess, because the
behaviour-changing material sat 550 lines down. Promote three gates to the top:

- "Before you debug: calibrate" -- you're likely OOD in research code; the
  failure mode is overconfidence/impatience; the tables are a menu to widen the
  search, never lookup-and-apply.
- "The debugging loop (judgment, not a checklist)" -- collect clues, hold a few
  competing hypotheses scaled to the problem, sanity-check with the
  likely/subtle/null triplet (shared vocab with research-journal), run the
  cheapest discriminating observation, then act.
- "When stuck, read a working implementation" -- promoted from a buried Part 7.3
  one-liner; extract the algorithm-done-right, the engineering tricks the paper
  omits, and proven hyperparams; rank candidates by trust signal.

Collapse duplicated advice to pointers; de-bold Part 6.4 (8 bolded openers -> a
plain list). Net +10 lines, bold markers 112 -> 100.

Verified by a blind comprehension-by-inference quiz (5 cold-reader models, OLD
vs NEW): NEW 9.6/10 vs OLD 6.8/10, with the gap localized to the two added
sections (read-working-impl 5/5 vs 0/5; tables-as-menu 2.0 vs 1.6) while
untouched sections tied -- ruling out a "reads nicer" halo.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-01 10:11:36 +08:00
2026-04-09 05:09:25 +08:00
2026-04-09 05:09:25 +08:00

ML Debugging Folklore

Practitioner knowledge for debugging ML systems, curated and synthesized by wassname. Opinionated by source selection -- I picked sources I trust (Schulman, Goodfellow, CS231n, ...) and had an LLM extract the most relevant information for debugging ML systems.

Use as a Claude skill

/skills add https://github.com/wassname/ml_debug

Or paste SKILL.md into your system prompt / context when debugging.

What's here

  • SKILL.md -- the main artifact. Load into an LLM agent's context as a debugging skill. Parts 1-5 are reference knowledge; Part 6 is a runnable triage protocol (grep patterns, diagnostic snippets, decision tree); Part 7 is debugging mental models and practitioner priors.

  • docs/evidence/ -- frozen local copies of source material (blog posts, talks, papers, reddit threads). Claims in SKILL.md link back to exact quotes here.

S
Description
No description provided
Readme 1.6 MiB
Languages
Markdown 100%