make_random_vhack: drop 'run 31's v_hack' dead run-ID + 'the task number', state
the actual role (shape source / reproducible seed). probe_distill: collapse the
'either finish or remove, for now...' train-of-thought into a tight FIXME.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Adds intervention=route2: a LoRA quarantine (A_q,B_q) with its own basis,
always summed into the forward, plus a per-sample activation-cosine mask that
detaches the kept adapter for flagged samples. Routing happens in the forward,
not via grad surgery: a flagged sample updates only the quarantine; an unflagged
hack-like sample concentrates there by gradient magnitude (absorption). Deploy
zeroes A_q,B_q. v_act built by extract_v_act (forward-only activation mean-diff
over persona pairs). Fixes the per-prompt zero_grad wiping quarantine grads
before opt.step. scripts/make_random_vhack.py = the random-V route control.
vhack_refresh_every default 0->5 (0 is ablation-only).
Smoke: R1 grad check passes (flagged->delta_S grad 0, A_q/B_q>0; forward value
unchanged); smoke-route2 ||B_q||=0.109, deploy eval + asserts pass.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>