ml-debug

mirror of https://github.com/wassname/ml-debug.git synced 2026-06-27 17:16:20 +08:00

Files

T

wassname 9911ac83c5 folklore: add lucidrains transformer-stability item (QK-norm, post-emb LN)

Phil Wang's x-transformers is the canonical "the fix is in the code, not the
paper" catalogue. Add a folklore item on the most debugging-relevant trick:
QK / cosine-sim normalization to stop attention logits overflowing (the usual
cause of transformer loss spikes/divergence), plus the BLOOM/YaLM
post-embedding LayerNorm. Two verbatim lucidrains quotes, footnoted to the repo
+ a cached README copy with line numbers. Doubles as the modern concrete
example for the read-a-working-implementation section.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-02 20:49:15 +08:00

evidence

folklore: add lucidrains transformer-stability item (QK-norm, post-emb LN)

2026-06-02 20:49:15 +08:00

ml_debug_folklore.argdown

initial: ML debugging folklore skill

2026-03-06 10:11:30 +08:00