mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:30:30 +08:00
spec: routeA plan approved; deletion scope extended to extract_vhack_grad + all grad-gate helpers
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -105,8 +105,16 @@ inside one SE; logs /tmp/claude-1000/superS_v1.log, act_dot_tstat.log, pinning_f
|
||||
|
||||
## Implementation plan
|
||||
|
||||
Ordered; each step is one commit with its verify gate. Not started until the user
|
||||
approves the plan.
|
||||
APPROVED by wassname 2026-06-11 ("ok great do it"), with one amendment: the deletion in
|
||||
step 4 covers not just train.py's routeV branch but the whole gradient-gate stack --
|
||||
`extract_vhack_grad.py` and every train.py helper that exists only for it
|
||||
(`_build_v_grad`, `route_band_edges`, `_pair_cos`, `_lora2r_gate_labels`, the pass-1
|
||||
`autograd.grad` block, `grad_probe=True` wiring). The c-probe mechanism itself stays in
|
||||
lora2r.py because scripts/diag_pinning.py uses it for diagnostics; training never
|
||||
enables it. Clean as you go; audit with a grep for routeV/v_grad/route_band/grad_probe
|
||||
across src/, justfile, and scripts/verify_* after.
|
||||
|
||||
Ordered; each step is one commit with its verify gate.
|
||||
|
||||
1. **Extraction** (`src/vgrout/extract_vhack_act.py`): `extract_v_act(model, wrappers,
|
||||
names, pairs, tok, device, tstat=False) -> dict[name, Tensor[r]]`. For each pair
|
||||
@@ -133,11 +141,14 @@ approves the plan.
|
||||
separation (mean of rout class minus mean of keep class) exceeds 1 buffer sd;
|
||||
otherwise collapse rout into absorb for that step.
|
||||
4. **Arm wiring**: `intervention="routeA"` (rename-on-logic-change; routeV results
|
||||
stay comparable only to routeV). routeV is REMOVED from train.py in the same
|
||||
commit (the c-probe/grad-gate machinery stays in scripts/ for diagnostics);
|
||||
`grad_probe=True` is then never set in training. Placebo flag
|
||||
`routeA_random_v_seed` = Haar-random unit v_act per module, identical machinery.
|
||||
Refresh: reuse `vhack_refresh_every` (forward-only now, so cheap).
|
||||
stay comparable only to routeV). routeV and the whole gradient-gate stack are
|
||||
REMOVED in the same commit: src/vgrout/extract_vhack_grad.py, and train.py's
|
||||
`_build_v_grad`, `route_band_edges`, `_pair_cos`, `_lora2r_gate_labels`, the pass-1
|
||||
`autograd.grad` block, and `grad_probe=True` wiring. The c-probe mechanism stays in
|
||||
lora2r.py only because scripts/diag_pinning.py uses it for diagnostics; training
|
||||
never enables it. Placebo flag `routeA_random_v_seed` = Haar-random unit v_act per
|
||||
module, identical machinery. Refresh: reuse `vhack_refresh_every` (forward-only
|
||||
now, so cheap).
|
||||
5. **Logging** (per step): gate AUROC on the A>0 contrast vs hack labels (diagnostic
|
||||
only; labels never feed routing), zone shares keep/absorb/rout, buffer mean/sd,
|
||||
(t_lo, t_hi) in z units, qmass. SHOULD lines per token-efficient-logging.
|
||||
|
||||
Reference in New Issue
Block a user