Files
evil_MoE/scripts
wassname 025debae6b feat: knob-ON eval (route arms) for like-for-like train-vs-deploy + teacher-off marker
The 2x2 train row used per-step hack_s (noisy n=28 train batch, knob-on) vs the
deploy row's smooth n=64 eval (knob-off) -- different estimators, confounded.
Now at each eval step route arms ALSO run the SAME n=64 eval with the quarantine
ACTIVE (knob-on = training policy), logged as hk_on/slv_on. vanilla/erase reuse
deploy (no quarantine -> knob-on==knob-off). plot_dynamics prefers hk_on for the
train series so the 2x2 differs ONLY in knob state.

Also: plot parses --teacher-off-step from argv and shades the teacher-ON region
[0,toff] + a dashed cut line in the 2x2. The stashed long-run route2 jobs
(92 KL, 94 teacher-off) inherit the knob-on eval automatically at runtime.

Smoke (route2 hk_on present + logged, both plot parse paths) green.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-03 00:00:24 +00:00
..
2026-05-30 04:16:56 +00:00
wip
2026-05-30 04:33:33 +00:00