mirror of https://github.com/wassname/ml_debug.git synced 2026-06-27 01:00:14 +08:00

T

wassname 9911ac83c5 folklore: add lucidrains transformer-stability item (QK-norm, post-emb LN)

Phil Wang's x-transformers is the canonical "the fix is in the code, not the
paper" catalogue. Add a folklore item on the most debugging-relevant trick:
QK / cosine-sim normalization to stop attention logits overflowing (the usual
cause of transformer loss spikes/divergence), plus the BLOOM/YaLM
post-embedding LayerNorm. Two verbatim lucidrains quotes, footnoted to the repo
+ a cached README copy with line numbers. Doubles as the modern concrete
example for the read-a-working-implementation section.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-02 20:49:15 +08:00

docs

folklore: add lucidrains transformer-stability item (QK-norm, post-emb LN)

2026-06-02 20:49:15 +08:00

pinn

name

2026-04-09 05:09:25 +08:00

refs

restructure: folklore-first, quote-verified, with wassname intro

2026-06-02 20:46:25 +08:00

name

2026-04-09 05:09:25 +08:00

.gitignore

chore: fix .gitignore (dlbooks path, *_log.md pattern)

2026-03-06 12:22:22 +08:00

README.md

restructure: folklore-first, quote-verified, with wassname intro

2026-06-02 20:46:25 +08:00

SKILL.md

folklore: add lucidrains transformer-stability item (QK-norm, post-emb LN)

2026-06-02 20:49:15 +08:00

README.md

wassname's ML Debugging Folklore

In an attempt to upskill the ML debugging on AI coding assistants (and humans), I've collected high quality sources on ML debugging and the mindset and the "taste". When I started ML I went searching for discussions on best practices, and started a few discussions of my own and they helped me a lot, I hope they can help others. This intro is human written, and the below is AI written with human guidance.

Practitioner knowledge for debugging ML systems, curated and synthesized by wassname. Opinionated by source selection -- I picked sources I trust (Schulman, Goodfellow, CS231n, ...) and had an LLM extract the most relevant information for debugging ML systems.

Use as a Claude skill

/skills add https://github.com/wassname/ml_debug

Or paste SKILL.md into your system prompt / context when debugging.

What's here

SKILL.md -- the main artifact. Load into an LLM agent's context as a debugging skill. Leads with the mindset (calibrate, mental models, general debugging tricks, and reading a working implementation when stuck), then a folklore section of sourced quotes, then an LLM-agent playbook (debugging loop, triage menu, anti-patterns). Deeper one-off tricks (loss-surface analysis, stuck-metric diagnosis, sweep reliability) live in refs/.
docs/evidence/ -- frozen local copies of source material (blog posts, talks, papers, reddit threads). Claims in SKILL.md link back to exact quotes here.