Files
ml-debug/docs
wassname 9911ac83c5 folklore: add lucidrains transformer-stability item (QK-norm, post-emb LN)
Phil Wang's x-transformers is the canonical "the fix is in the code, not the
paper" catalogue. Add a folklore item on the most debugging-relevant trick:
QK / cosine-sim normalization to stop attention logits overflowing (the usual
cause of transformer loss spikes/divergence), plus the BLOOM/YaLM
post-embedding LayerNorm. Two verbatim lucidrains quotes, footnoted to the repo
+ a cached README copy with line numbers. Doubles as the modern concrete
example for the read-a-working-implementation section.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-02 20:49:15 +08:00
..