mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 22:22:21 +08:00
a1ef566bac
200 steps/G=16/1536tok/n=10 (paper) vs 60 steps/G=8/512tok/n=1 (ours). Framed as fast-preset directional surrogate within resource budget. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>