mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 17:30:41 +08:00
34ad20db0a
The route2 tau-gate anchored on (teacher OR hacked_E student). hacked_E is the run_tests detector; it cross-fires <=1.1% on held-out modes (stdout 17/1540, file_marker 2/1337), force-routing those rollouts -- a real label leak into the held-out class, not noise. Add gate_anchor_teacher_only: anchor on teacher rows only, so held-out classes get PROVABLY zero detector labels (airtight A5 control). Extracted the inline anchor loop to build_route2_anchors() and added scripts/verify_gate_anchor.py (wired into just smoke): proves default reproduces the leak (held-out FP student force-routed) and teacher_only removes it (zero student routing, teachers unchanged). 9/9 assertions pass. Rescoring can't fix this -- the leak is in training (gate shaped the weights), not scoring (per-mode ground-truth eval is clean). Retrain is the only path; the A5 run saved no per-eval checkpoints anyway. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>