mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 18:43:00 +08:00
65a05c365c
Job 97 (gentle preset lr=1e-3/adam0.9-0.99/beta=0) ran vanilla-200 without collapse (lp_s in [-0.47,-0.29] to step 200, deploy hack 0.375). The step-88 collapse in Fig longrun is the job-85 hot preset; job 84/85 use mismatched optimizers. Mark figure for regen from matched beta=1e-5 pair (jobs 100/101). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>