- Promote the general (non-RL-specific) Spinning Up lessons up to the main folklore: "broken code fails silently", "you can't tell it's broken if you can't see that it's breaking", and test on more than one setup. - Add gwern's "Unseeing" to the data theme: you can't read what you actually wrote, hence fresh eyes / a fresh-eyes subagent. - New "Research taste (adjacent to debugging)" section with verbatim quotes, each cached: Neel Nanda (your research is false by default; excitement is evidence of bullshit; read your data), Ulisse Mini (understand the system to shrink the search space), John Wentworth (gears-level models are capital investments vs cheap black boxes). All quotes verbatim from cached sources; 25/25 footnotes resolve. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
wassname's ML Debugging Folklore
In an attempt to upskill the ML debugging on AI coding assistants (and humans), I've collected high quality sources on ML debugging and the mindset and the "taste". When I started ML I went searching for discussions on best practices, and started a few discussions of my own and they helped me a lot, I hope they can help others. This intro is human written, and the below is AI written with human guidance.
Practitioner knowledge for debugging ML systems, curated and synthesized by wassname. Opinionated by source selection -- I picked sources I trust (Schulman, Goodfellow, CS231n, ...) and had an LLM extract the most relevant information for debugging ML systems.
Use as a Claude skill
/skills add https://github.com/wassname/ml_debug
Or paste SKILL.md into your system prompt / context when debugging.
What's here
-
SKILL.md -- the main artifact. Load into an LLM agent's context as a debugging skill. Leads with the mindset (calibrate, mental models, general debugging tricks, and reading a working implementation when stuck), then a folklore section of sourced quotes, then an LLM-agent playbook (debugging loop, triage menu, anti-patterns). Deeper one-off tricks (loss-surface analysis, stuck-metric diagnosis, sweep reliability) live in refs/.
-
docs/evidence/ -- frozen local copies of source material (blog posts, talks, papers, reddit threads). Claims in SKILL.md link back to exact quotes here.