mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 19:31:11 +08:00
1fb49a3325
Header reprint fixes the variable-width misread trap (20+ unlabeled cols, gn adjacent to lr). Records the anticipated Piggyback 'why not learn the routing mask' critique (answer: no-cheat withholds the per-rollout label a learned mask needs) and LoRA rank-deficiency as mild support for the low-rank hack subspace. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>