From 77fa5bbf6b6b926849ed0f9aed6995aeca965a3e Mon Sep 17 00:00:00 2001 From: wassname <1103714+wassname@users.noreply.github.com> Date: Thu, 11 Jun 2026 11:50:20 +0000 Subject: [PATCH] spec: routeA plan approved; deletion scope extended to extract_vhack_grad + all grad-gate helpers Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com> --- docs/spec/20260611_act_gate_spec.md | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/docs/spec/20260611_act_gate_spec.md b/docs/spec/20260611_act_gate_spec.md index a07162c..2ff12e4 100644 --- a/docs/spec/20260611_act_gate_spec.md +++ b/docs/spec/20260611_act_gate_spec.md @@ -105,8 +105,16 @@ inside one SE; logs /tmp/claude-1000/superS_v1.log, act_dot_tstat.log, pinning_f ## Implementation plan -Ordered; each step is one commit with its verify gate. Not started until the user -approves the plan. +APPROVED by wassname 2026-06-11 ("ok great do it"), with one amendment: the deletion in +step 4 covers not just train.py's routeV branch but the whole gradient-gate stack -- +`extract_vhack_grad.py` and every train.py helper that exists only for it +(`_build_v_grad`, `route_band_edges`, `_pair_cos`, `_lora2r_gate_labels`, the pass-1 +`autograd.grad` block, `grad_probe=True` wiring). The c-probe mechanism itself stays in +lora2r.py because scripts/diag_pinning.py uses it for diagnostics; training never +enables it. Clean as you go; audit with a grep for routeV/v_grad/route_band/grad_probe +across src/, justfile, and scripts/verify_* after. + +Ordered; each step is one commit with its verify gate. 1. **Extraction** (`src/vgrout/extract_vhack_act.py`): `extract_v_act(model, wrappers, names, pairs, tok, device, tstat=False) -> dict[name, Tensor[r]]`. For each pair @@ -133,11 +141,14 @@ approves the plan. separation (mean of rout class minus mean of keep class) exceeds 1 buffer sd; otherwise collapse rout into absorb for that step. 4. **Arm wiring**: `intervention="routeA"` (rename-on-logic-change; routeV results - stay comparable only to routeV). routeV is REMOVED from train.py in the same - commit (the c-probe/grad-gate machinery stays in scripts/ for diagnostics); - `grad_probe=True` is then never set in training. Placebo flag - `routeA_random_v_seed` = Haar-random unit v_act per module, identical machinery. - Refresh: reuse `vhack_refresh_every` (forward-only now, so cheap). + stay comparable only to routeV). routeV and the whole gradient-gate stack are + REMOVED in the same commit: src/vgrout/extract_vhack_grad.py, and train.py's + `_build_v_grad`, `route_band_edges`, `_pair_cos`, `_lora2r_gate_labels`, the pass-1 + `autograd.grad` block, and `grad_probe=True` wiring. The c-probe mechanism stays in + lora2r.py only because scripts/diag_pinning.py uses it for diagnostics; training + never enables it. Placebo flag `routeA_random_v_seed` = Haar-random unit v_act per + module, identical machinery. Refresh: reuse `vhack_refresh_every` (forward-only + now, so cheap). 5. **Logging** (per step): gate AUROC on the A>0 contrast vs hack labels (diagnostic only; labels never feed routing), zone shares keep/absorb/rout, buffer mean/sd, (t_lo, t_hi) in z units, qmass. SHOULD lines per token-efficient-logging.