mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 17:30:41 +08:00
025debae6b
The 2x2 train row used per-step hack_s (noisy n=28 train batch, knob-on) vs the deploy row's smooth n=64 eval (knob-off) -- different estimators, confounded. Now at each eval step route arms ALSO run the SAME n=64 eval with the quarantine ACTIVE (knob-on = training policy), logged as hk_on/slv_on. vanilla/erase reuse deploy (no quarantine -> knob-on==knob-off). plot_dynamics prefers hk_on for the train series so the 2x2 differs ONLY in knob state. Also: plot parses --teacher-off-step from argv and shades the teacher-ON region [0,toff] + a dashed cut line in the 2x2. The stashed long-run route2 jobs (92 KL, 94 teacher-off) inherit the knob-on eval automatically at runtime. Smoke (route2 hk_on present + logged, both plot parse paths) green. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>