fix: route2 refresh was a silent no-op (vhack_refresh_every gated on v_hack, which route2 sets None)

route2 uses v_act/v_grad, not v_hack, so --vhack-refresh-every never fired
for it -- the mask was frozen regardless of the flag. Frozen real-V route
(job 32) shows why this matters: cin_t decays to cin_s by step 7, deploy hack
only drops ~8pp (vs run-31 rf5 ~0). Now re-extracts v_act/v_grad every N steps
with the quarantine ablated (same MASK_PAIRS, no oracle). + journal entry (j).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-05-31 12:19:38 +00:00
parent 20f8630848
commit 25569193c5
2 changed files with 110 additions and 1 deletions
+72
View File
@@ -2,6 +2,78 @@
Append-only. New entries at the top, date-stamped. Never edit old entries. Append-only. New entries at the top, date-stamped. Never edit old entries.
## 2026-05-31 (j) — frozen-real-V route (rf0) only drops deploy hack ~8pp, NOT to run-31's ~0; staleness is the cause
**Context:** commit `2b020c9` on `probe/distill-cosine`; pueue id 32; route real-V,
refresh OFF, seed 41, 60 steps. The rf0 anchor for the random-V comparison (#157).
Log: `logs/20260531T095110_fast_routing_seed41_sub4_route_rf0_REAL_s41.log`.
**Hypothesis:** frozen route reproduces run-31's deploy hack-drop; random-V (job 33)
then tells us if the drop is directional or just adapter-ablation.
### Observations
- [obs] ROUTE EVAL n=64: train/knob-on hack=0.375 solve=0.422 | deploy/knob-off
hack=0.297 solve=0.500. Deploy hack drops only ~8pp from train, NOT to ~0.
- [obs] Per-mode (train->deploy hack): file_marker 15/32->13/32 (held-out),
run_tests 7/8->6/8 (in-dist), sentinel 2/24->0/24 (held-out, already low).
- [obs] cin_t > cin_s only for steps 2-6 (0.32 vs 0.15 at step 2), then cin_t
decays to meet cin_s (~0.13) and is <= cin_s for the rest. The decay is
teacher-side: cin_s stays flat ~0.13-0.18, cin_t falls.
- [obs] cout=0.00 every step (routing mechanically removes the hack-ward component).
- [obs] hack_s climbs 0 -> ~20/28 (71%); route does not suppress live training hack
(expected; suppression is deploy-only).
### Inferences
- [inf] The weak frozen-route deploy drop is caused by basis staleness: with v_hack
frozen and delta_S moving, the cached teacher gradient stops looking hack-ward in
the stale basis after ~6 steps (cin_t collapse), so the quarantine stops absorbing
the cheat. Refresh keeps cin_t high -> more absorption -> lower deploy hack. This
is why run-31 (rf5) reached ~0 and rf0 does not. {reason: "cin_t decay timing
matches the absorption gap; cout=0 shows the routing op works, so the failure is in
WHAT the stale basis points at, not the mechanics", credence: 0.6}
- [inf] Directly motivates refreshing the mask every 5 steps for the vector-based
arms (route2 act/grad). Found+fixed a related silent no-op: route2 set v_hack=None
so `--vhack-refresh-every` never fired for it; v_act/v_grad were frozen regardless
of the flag. Now re-extracted every N steps with the quarantine ablated. {reason:
"code-read of the refresh guard `if v_hack is not None`", credence: 0.95}
### Failure modes considered
- **Likely:** the 8pp drop is adapter-ablation regularization, not directional. Prior
0.4. Check: job 33 random-V rf0 — if it also drops ~8pp, the direction isn't
load-bearing at rf0.
- **Subtle:** deploy eval n=64 is small; 8pp (and per-mode 15->13, 7/8->6/8) within
binomial noise. Prior 0.3. Check: n>=2 seeds before trusting the gap.
- **Null:** run-31's "~0" was a different config (rf5 + per-mode decompose job 31),
not directly comparable to rf0; the contrast may be partly cross-run artifact.
Prior 0.2. Check: rerun route rf5 with this exact harness/eval, compare head-to-head.
### Next action
Job 33 (random-V rf0) running -> directional-vs-ablation check. Route2 act/grad
(34/35) now actually refresh v_act/v_grad every 5 (fix committed). Compare frozen-real
(32) vs random (33) deploy hack when 33 lands.
## 2026-05-31 (i) — CORRECTION to (h): the AdamW-parasite does NOT transfer to route2
Entry (h) inf #2 ("Adam-parasite transfers to route2", credence 0.7) was WRONG.
- [inf] The v1 parasite is specific to a TINY routed residual: `delta_S_hack` gets
only the projected component `c@V`, near-noise when cin~0.1, which AdamW's m/sqrt(v)
amplifies to ~full step size -> drift. route2's `B_q` is a distinct-basis LoRA,
always in the forward, so it receives a FULL O(1) gradient (normal LoRA training),
not a tiny residual. No residual to amplify -> AdamW is fine. The distinct basis is
exactly the fix for the v1 pathology. {reason: "B_q grad = dL/dB_q through its own
forward path, magnitude set by the loss not by a near-zero projection; this is the
mechanism difference the reviews themselves point at", credence: 0.85}
- [inf] So the deploy solve-jump is NOT a route2 worry to babysit. The win is
well-defined and clean: held-out deploy-hack drops at matched deploy-solve, beating
vanilla/erase. The single confound-check is the random-V control (#157 job 33), not
a per-run solve-jump diagnostic. {reason: "random-V isolates directional-vs-adapter-
deletion directly; nothing else needed", credence: 0.8}
## 2026-05-31 (h) — three external reviews converge: the v1 additive-route "deploy=0" is most likely an AdamW-parasite + base-recovery artifact, not localization ## 2026-05-31 (h) — three external reviews converge: the v1 additive-route "deploy=0" is most likely an AdamW-parasite + base-recovery artifact, not localization
**Context:** commit `670fcb3` on `probe/distill-cosine`. Reviews: **Context:** commit `670fcb3` on `probe/distill-cosine`. Reviews:
+38 -1
View File
@@ -1554,7 +1554,44 @@ def main(cfg: Config) -> int:
# than at step 0). Same PAIRS, same extract code; we just discard the # than at step 0). Same PAIRS, same extract code; we just discard the
# saved cache and overwrite the in-memory v_hack dict. # saved cache and overwrite the in-memory v_hack dict.
refr = "-" # set to "mod/axes" below if a refresh fires; rendered in the per-step row refr = "-" # set to "mod/axes" below if a refresh fires; rendered in the per-step row
if v_hack is not None and cfg.vhack_refresh_every > 0 and (step + 1) % cfg.vhack_refresh_every == 0: do_refresh = cfg.vhack_refresh_every > 0 and (step + 1) % cfg.vhack_refresh_every == 0
if do_refresh and is_route2:
# route2 mask refresh: re-extract v_act / v_grad against the CURRENT
# model so the mask tracks where hacks separate now, not at step 0.
# Without this the frozen mask goes stale -- cin_t decays to cin_s
# within ~6 steps (2026-05-31 journal, frozen-real-V route). Same
# MASK_PAIRS (the weak detector, no oracle); quarantine ablated so the
# hack signal flows back through the observable path, matching the
# B_q=0 state the build-time extraction saw.
_was_training = model.training
model.eval()
opt.zero_grad(set_to_none=True)
logger.disable("projected_grpo.extract_vhack_grad")
logger.disable("__main__")
try:
with ablate_quarantine(wrappers):
if cfg.route2_mask == "act":
from .extract_vhack_grad import extract_v_act
_v = extract_v_act(model, tok, wrappers, MASK_PAIRS, n_heldout=2, device=device)
for name, info in wrappers.items():
info["layer"]._antipasto_v_act.data.copy_(_v[name].to(device))
else:
from .extract_vhack_grad import extract_v_hack
_, _, raw_grads, _ = extract_v_hack(
model, tok, wrappers, MASK_PAIRS,
top_k=1, tau_axis=0.0, n_heldout=2, device=device,
)
for name in wrappers: # update in place so _route2_grad_filter's closure sees it
d = (raw_grads[f"hack/{name}"] - raw_grads[f"clean/{name}"]).mean(0)
v_grad[name] = (d / d.norm().clamp_min(1e-12)).to(device)
finally:
logger.enable("projected_grpo.extract_vhack_grad")
logger.enable("__main__")
opt.zero_grad(set_to_none=True) # extract leaves .grad populated
if _was_training:
model.train()
refr = f"route2:{cfg.route2_mask}"
if v_hack is not None and do_refresh:
from .extract_vhack_grad import extract_v_hack from .extract_vhack_grad import extract_v_hack
if cfg.vhack_pairs_path is not None: if cfg.vhack_pairs_path is not None:
from .pairs_from_pool import load_pairs_json from .pairs_from_pool import load_pairs_json