spec: routeA plan approved; deletion scope extended to extract_vhack_grad + all grad-gate helpers

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 16:30:30 +08:00 · 2026-06-11 11:50:20 +00:00
parent 8b6bf5b6a6
commit 77fa5bbf6b
1 changed files with 18 additions and 7 deletions
@@ -105,8 +105,16 @@ inside one SE; logs /tmp/claude-1000/superS_v1.log, act_dot_tstat.log, pinning_f

 ## Implementation plan

-Ordered; each step is one commit with its verify gate. Not started until the user
-approves the plan.
+APPROVED by wassname 2026-06-11 ("ok great do it"), with one amendment: the deletion in
+step 4 covers not just train.py's routeV branch but the whole gradient-gate stack --
+`extract_vhack_grad.py` and every train.py helper that exists only for it
+(`_build_v_grad`, `route_band_edges`, `_pair_cos`, `_lora2r_gate_labels`, the pass-1
+`autograd.grad` block, `grad_probe=True` wiring). The c-probe mechanism itself stays in
+lora2r.py because scripts/diag_pinning.py uses it for diagnostics; training never
+enables it. Clean as you go; audit with a grep for routeV/v_grad/route_band/grad_probe
+across src/, justfile, and scripts/verify_* after.
+
+Ordered; each step is one commit with its verify gate.

 1. **Extraction** (`src/vgrout/extract_vhack_act.py`): `extract_v_act(model, wrappers,
   names, pairs, tok, device, tstat=False) -> dict[name, Tensor[r]]`. For each pair
@@ -133,11 +141,14 @@ approves the plan.
   separation (mean of rout class minus mean of keep class) exceeds 1 buffer sd;
   otherwise collapse rout into absorb for that step.
 4. **Arm wiring**: `intervention="routeA"` (rename-on-logic-change; routeV results
-   stay comparable only to routeV). routeV is REMOVED from train.py in the same
-   commit (the c-probe/grad-gate machinery stays in scripts/ for diagnostics);
-   `grad_probe=True` is then never set in training. Placebo flag
-   `routeA_random_v_seed` = Haar-random unit v_act per module, identical machinery.
-   Refresh: reuse `vhack_refresh_every` (forward-only now, so cheap).
+   stay comparable only to routeV). routeV and the whole gradient-gate stack are
+   REMOVED in the same commit: src/vgrout/extract_vhack_grad.py, and train.py's
+   `_build_v_grad`, `route_band_edges`, `_pair_cos`, `_lora2r_gate_labels`, the pass-1
+   `autograd.grad` block, and `grad_probe=True` wiring. The c-probe mechanism stays in
+   lora2r.py only because scripts/diag_pinning.py uses it for diagnostics; training
+   never enables it. Placebo flag `routeA_random_v_seed` = Haar-random unit v_act per
+   module, identical machinery. Refresh: reuse `vhack_refresh_every` (forward-only
+   now, so cheap).
 5. **Logging** (per step): gate AUROC on the A>0 contrast vs hack labels (diagnostic
   only; labels never feed routing), zone shares keep/absorb/rout, buffer mean/sd,
   (t_lo, t_hi) in z units, qmass. SHOULD lines per token-efficient-logging.