mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 21:07:17 +08:00
4359dc53a8
Adds intervention=route2: a LoRA quarantine (A_q,B_q) with its own basis, always summed into the forward, plus a per-sample activation-cosine mask that detaches the kept adapter for flagged samples. Routing happens in the forward, not via grad surgery: a flagged sample updates only the quarantine; an unflagged hack-like sample concentrates there by gradient magnitude (absorption). Deploy zeroes A_q,B_q. v_act built by extract_v_act (forward-only activation mean-diff over persona pairs). Fixes the per-prompt zero_grad wiping quarantine grads before opt.step. scripts/make_random_vhack.py = the random-V route control. vhack_refresh_every default 0->5 (0 is ablation-only). Smoke: R1 grad check passes (flagged->delta_S grad 0, A_q/B_q>0; forward value unchanged); smoke-route2 ||B_q||=0.109, deploy eval + asserts pass. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>