folklore: add lucidrains transformer-stability item (QK-norm, post-emb LN)

Phil Wang's x-transformers is the canonical "the fix is in the code, not the
paper" catalogue. Add a folklore item on the most debugging-relevant trick:
QK / cosine-sim normalization to stop attention logits overflowing (the usual
cause of transformer loss spikes/divergence), plus the BLOOM/YaLM
post-embedding LayerNorm. Two verbatim lucidrains quotes, footnoted to the repo
+ a cached README copy with line numbers. Doubles as the modern concrete
example for the read-a-working-implementation section.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-02 20:49:15 +08:00
parent 38ec634ff3
commit 9911ac83c5
2 changed files with 2711 additions and 0 deletions
File diff suppressed because it is too large Load Diff