ml-debug

wassname/ml-debug

Fork 0

mirror of https://github.com/wassname/ml-debug.git synced 2026-06-27 17:49:08 +08:00

Commit Graph

Author	SHA1	Message	Date
wassname	9911ac83c5	folklore: add lucidrains transformer-stability item (QK-norm, post-emb LN) Phil Wang's x-transformers is the canonical "the fix is in the code, not the paper" catalogue. Add a folklore item on the most debugging-relevant trick: QK / cosine-sim normalization to stop attention logits overflowing (the usual cause of transformer loss spikes/divergence), plus the BLOOM/YaLM post-embedding LayerNorm. Two verbatim lucidrains quotes, footnoted to the repo + a cached README copy with line numbers. Doubles as the modern concrete example for the read-a-working-implementation section. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-02 20:49:15 +08:00

Author

SHA1

Message

Date

wassname

9911ac83c5

folklore: add lucidrains transformer-stability item (QK-norm, post-emb LN)

Phil Wang's x-transformers is the canonical "the fix is in the code, not the
paper" catalogue. Add a folklore item on the most debugging-relevant trick:
QK / cosine-sim normalization to stop attention logits overflowing (the usual
cause of transformer loss spikes/divergence), plus the BLOOM/YaLM
post-embedding LayerNorm. Two verbatim lucidrains quotes, footnoted to the repo
+ a cached README copy with line numbers. Doubles as the modern concrete
example for the read-a-working-implementation section.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-02 20:49:15 +08:00

1 Commits