From 25569193c5cac0b2226c51cb5682757f564900f0 Mon Sep 17 00:00:00 2001
From: wassname <github@wassname>
Date: Sun, 31 May 2026 12:19:38 +0000
Subject: [PATCH] fix: route2 refresh was a silent no-op (vhack_refresh_every
 gated on v_hack, which route2 sets None)

route2 uses v_act/v_grad, not v_hack, so --vhack-refresh-every never fired
for it -- the mask was frozen regardless of the flag. Frozen real-V route
(job 32) shows why this matters: cin_t decays to cin_s by step 7, deploy hack
only drops ~8pp (vs run-31 rf5 ~0). Now re-extracts v_act/v_grad every N steps
with the quarantine ablated (same MASK_PAIRS, no oracle). + journal entry (j).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
---
 RESEARCH_JOURNAL.md         | 72 +++++++++++++++++++++++++++++++++++++
 src/projected_grpo/train.py | 39 +++++++++++++++++++-
 2 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/RESEARCH_JOURNAL.md b/RESEARCH_JOURNAL.md
index 058700b..48ae4a0 100644
--- a/RESEARCH_JOURNAL.md
+++ b/RESEARCH_JOURNAL.md
@@ -2,6 +2,78 @@
 
 Append-only. New entries at the top, date-stamped. Never edit old entries.
 
+## 2026-05-31 (j) — frozen-real-V route (rf0) only drops deploy hack ~8pp, NOT to run-31's ~0; staleness is the cause
+
+**Context:** commit `2b020c9` on `probe/distill-cosine`; pueue id 32; route real-V,
+refresh OFF, seed 41, 60 steps. The rf0 anchor for the random-V comparison (#157).
+Log: `logs/20260531T095110_fast_routing_seed41_sub4_route_rf0_REAL_s41.log`.
+
+**Hypothesis:** frozen route reproduces run-31's deploy hack-drop; random-V (job 33)
+then tells us if the drop is directional or just adapter-ablation.
+
+### Observations
+
+- [obs] ROUTE EVAL n=64: train/knob-on hack=0.375 solve=0.422 | deploy/knob-off
+  hack=0.297 solve=0.500. Deploy hack drops only ~8pp from train, NOT to ~0.
+- [obs] Per-mode (train->deploy hack): file_marker 15/32->13/32 (held-out),
+  run_tests 7/8->6/8 (in-dist), sentinel 2/24->0/24 (held-out, already low).
+- [obs] cin_t > cin_s only for steps 2-6 (0.32 vs 0.15 at step 2), then cin_t
+  decays to meet cin_s (~0.13) and is <= cin_s for the rest. The decay is
+  teacher-side: cin_s stays flat ~0.13-0.18, cin_t falls.
+- [obs] cout=0.00 every step (routing mechanically removes the hack-ward component).
+- [obs] hack_s climbs 0 -> ~20/28 (71%); route does not suppress live training hack
+  (expected; suppression is deploy-only).
+
+### Inferences
+
+- [inf] The weak frozen-route deploy drop is caused by basis staleness: with v_hack
+  frozen and delta_S moving, the cached teacher gradient stops looking hack-ward in
+  the stale basis after ~6 steps (cin_t collapse), so the quarantine stops absorbing
+  the cheat. Refresh keeps cin_t high -> more absorption -> lower deploy hack. This
+  is why run-31 (rf5) reached ~0 and rf0 does not. {reason: "cin_t decay timing
+  matches the absorption gap; cout=0 shows the routing op works, so the failure is in
+  WHAT the stale basis points at, not the mechanics", credence: 0.6}
+- [inf] Directly motivates refreshing the mask every 5 steps for the vector-based
+  arms (route2 act/grad). Found+fixed a related silent no-op: route2 set v_hack=None
+  so `--vhack-refresh-every` never fired for it; v_act/v_grad were frozen regardless
+  of the flag. Now re-extracted every N steps with the quarantine ablated. {reason:
+  "code-read of the refresh guard `if v_hack is not None`", credence: 0.95}
+
+### Failure modes considered
+
+- **Likely:** the 8pp drop is adapter-ablation regularization, not directional. Prior
+  0.4. Check: job 33 random-V rf0 — if it also drops ~8pp, the direction isn't
+  load-bearing at rf0.
+- **Subtle:** deploy eval n=64 is small; 8pp (and per-mode 15->13, 7/8->6/8) within
+  binomial noise. Prior 0.3. Check: n>=2 seeds before trusting the gap.
+- **Null:** run-31's "~0" was a different config (rf5 + per-mode decompose job 31),
+  not directly comparable to rf0; the contrast may be partly cross-run artifact.
+  Prior 0.2. Check: rerun route rf5 with this exact harness/eval, compare head-to-head.
+
+### Next action
+
+Job 33 (random-V rf0) running -> directional-vs-ablation check. Route2 act/grad
+(34/35) now actually refresh v_act/v_grad every 5 (fix committed). Compare frozen-real
+(32) vs random (33) deploy hack when 33 lands.
+
+## 2026-05-31 (i) — CORRECTION to (h): the AdamW-parasite does NOT transfer to route2
+
+Entry (h) inf #2 ("Adam-parasite transfers to route2", credence 0.7) was WRONG.
+
+- [inf] The v1 parasite is specific to a TINY routed residual: `delta_S_hack` gets
+  only the projected component `c@V`, near-noise when cin~0.1, which AdamW's m/sqrt(v)
+  amplifies to ~full step size -> drift. route2's `B_q` is a distinct-basis LoRA,
+  always in the forward, so it receives a FULL O(1) gradient (normal LoRA training),
+  not a tiny residual. No residual to amplify -> AdamW is fine. The distinct basis is
+  exactly the fix for the v1 pathology. {reason: "B_q grad = dL/dB_q through its own
+  forward path, magnitude set by the loss not by a near-zero projection; this is the
+  mechanism difference the reviews themselves point at", credence: 0.85}
+- [inf] So the deploy solve-jump is NOT a route2 worry to babysit. The win is
+  well-defined and clean: held-out deploy-hack drops at matched deploy-solve, beating
+  vanilla/erase. The single confound-check is the random-V control (#157 job 33), not
+  a per-run solve-jump diagnostic. {reason: "random-V isolates directional-vs-adapter-
+  deletion directly; nothing else needed", credence: 0.8}
+
 ## 2026-05-31 (h) — three external reviews converge: the v1 additive-route "deploy=0" is most likely an AdamW-parasite + base-recovery artifact, not localization
 
 **Context:** commit `670fcb3` on `probe/distill-cosine`. Reviews:
diff --git a/src/projected_grpo/train.py b/src/projected_grpo/train.py
index 485b146..615d627 100644
--- a/src/projected_grpo/train.py
+++ b/src/projected_grpo/train.py
@@ -1554,7 +1554,44 @@ def main(cfg: Config) -> int:
         # than at step 0). Same PAIRS, same extract code; we just discard the
         # saved cache and overwrite the in-memory v_hack dict.
         refr = "-"  # set to "mod/axes" below if a refresh fires; rendered in the per-step row
-        if v_hack is not None and cfg.vhack_refresh_every > 0 and (step + 1) % cfg.vhack_refresh_every == 0:
+        do_refresh = cfg.vhack_refresh_every > 0 and (step + 1) % cfg.vhack_refresh_every == 0
+        if do_refresh and is_route2:
+            # route2 mask refresh: re-extract v_act / v_grad against the CURRENT
+            # model so the mask tracks where hacks separate now, not at step 0.
+            # Without this the frozen mask goes stale -- cin_t decays to cin_s
+            # within ~6 steps (2026-05-31 journal, frozen-real-V route). Same
+            # MASK_PAIRS (the weak detector, no oracle); quarantine ablated so the
+            # hack signal flows back through the observable path, matching the
+            # B_q=0 state the build-time extraction saw.
+            _was_training = model.training
+            model.eval()
+            opt.zero_grad(set_to_none=True)
+            logger.disable("projected_grpo.extract_vhack_grad")
+            logger.disable("__main__")
+            try:
+                with ablate_quarantine(wrappers):
+                    if cfg.route2_mask == "act":
+                        from .extract_vhack_grad import extract_v_act
+                        _v = extract_v_act(model, tok, wrappers, MASK_PAIRS, n_heldout=2, device=device)
+                        for name, info in wrappers.items():
+                            info["layer"]._antipasto_v_act.data.copy_(_v[name].to(device))
+                    else:
+                        from .extract_vhack_grad import extract_v_hack
+                        _, _, raw_grads, _ = extract_v_hack(
+                            model, tok, wrappers, MASK_PAIRS,
+                            top_k=1, tau_axis=0.0, n_heldout=2, device=device,
+                        )
+                        for name in wrappers:  # update in place so _route2_grad_filter's closure sees it
+                            d = (raw_grads[f"hack/{name}"] - raw_grads[f"clean/{name}"]).mean(0)
+                            v_grad[name] = (d / d.norm().clamp_min(1e-12)).to(device)
+            finally:
+                logger.enable("projected_grpo.extract_vhack_grad")
+                logger.enable("__main__")
+            opt.zero_grad(set_to_none=True)  # extract leaves .grad populated
+            if _was_training:
+                model.train()
+            refr = f"route2:{cfg.route2_mask}"
+        if v_hack is not None and do_refresh:
             from .extract_vhack_grad import extract_v_hack
             if cfg.vhack_pairs_path is not None:
                 from .pairs_from_pool import load_pairs_json