fix: route2 refresh was a silent no-op (vhack_refresh_every gated on v_hack, which route2 sets None)

route2 uses v_act/v_grad, not v_hack, so --vhack-refresh-every never fired for it -- the mask was frozen regardless of the flag. Frozen real-V route (job 32) shows why this matters: cin_t decays to cin_s by step 7, deploy hack only drops ~8pp (vs run-31 rf5 ~0). Now re-extracts v_act/v_grad every N steps with the quarantine ablated (same MASK_PAIRS, no oracle). + journal entry (j). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 17:30:41 +08:00 · 2026-05-31 12:19:38 +00:00
parent 20f8630848
commit 25569193c5
2 changed files with 110 additions and 1 deletions
@@ -2,6 +2,78 @@
 Append-only. New entries at the top, date-stamped. Never edit old entries.
 ## 2026-05-31 (j) — frozen-real-V route (rf0) only drops deploy hack ~8pp, NOT to run-31's ~0; staleness is the cause
 **Context:** commit `2b020c9` on `probe/distill-cosine`; pueue id 32; route real-V,
 refresh OFF, seed 41, 60 steps. The rf0 anchor for the random-V comparison (#157).
 Log: `logs/20260531T095110_fast_routing_seed41_sub4_route_rf0_REAL_s41.log`.
 **Hypothesis:** frozen route reproduces run-31's deploy hack-drop; random-V (job 33)
 then tells us if the drop is directional or just adapter-ablation.
 ### Observations
 - [obs] ROUTE EVAL n=64: train/knob-on hack=0.375 solve=0.422 | deploy/knob-off
  hack=0.297 solve=0.500. Deploy hack drops only ~8pp from train, NOT to ~0.
 - [obs] Per-mode (train->deploy hack): file_marker 15/32->13/32 (held-out),
  run_tests 7/8->6/8 (in-dist), sentinel 2/24->0/24 (held-out, already low).
 - [obs] cin_t > cin_s only for steps 2-6 (0.32 vs 0.15 at step 2), then cin_t
  decays to meet cin_s (~0.13) and is <= cin_s for the rest. The decay is
  teacher-side: cin_s stays flat ~0.13-0.18, cin_t falls.
 - [obs] cout=0.00 every step (routing mechanically removes the hack-ward component).
 - [obs] hack_s climbs 0 -> ~20/28 (71%); route does not suppress live training hack
  (expected; suppression is deploy-only).
 ### Inferences
 - [inf] The weak frozen-route deploy drop is caused by basis staleness: with v_hack
  frozen and delta_S moving, the cached teacher gradient stops looking hack-ward in
  the stale basis after ~6 steps (cin_t collapse), so the quarantine stops absorbing
  the cheat. Refresh keeps cin_t high -> more absorption -> lower deploy hack. This
  is why run-31 (rf5) reached ~0 and rf0 does not. {reason: "cin_t decay timing
  matches the absorption gap; cout=0 shows the routing op works, so the failure is in
  WHAT the stale basis points at, not the mechanics", credence: 0.6}
 - [inf] Directly motivates refreshing the mask every 5 steps for the vector-based
  arms (route2 act/grad). Found+fixed a related silent no-op: route2 set v_hack=None
  so `--vhack-refresh-every` never fired for it; v_act/v_grad were frozen regardless
  of the flag. Now re-extracted every N steps with the quarantine ablated. {reason:
  "code-read of the refresh guard `if v_hack is not None`", credence: 0.95}
 ### Failure modes considered
 - **Likely:** the 8pp drop is adapter-ablation regularization, not directional. Prior
  0.4. Check: job 33 random-V rf0 — if it also drops ~8pp, the direction isn't
  load-bearing at rf0.
 - **Subtle:** deploy eval n=64 is small; 8pp (and per-mode 15->13, 7/8->6/8) within
  binomial noise. Prior 0.3. Check: n>=2 seeds before trusting the gap.
 - **Null:** run-31's "~0" was a different config (rf5 + per-mode decompose job 31),
  not directly comparable to rf0; the contrast may be partly cross-run artifact.
  Prior 0.2. Check: rerun route rf5 with this exact harness/eval, compare head-to-head.
 ### Next action
 Job 33 (random-V rf0) running -> directional-vs-ablation check. Route2 act/grad
 (34/35) now actually refresh v_act/v_grad every 5 (fix committed). Compare frozen-real
 (32) vs random (33) deploy hack when 33 lands.
 ## 2026-05-31 (i) — CORRECTION to (h): the AdamW-parasite does NOT transfer to route2
 Entry (h) inf #2 ("Adam-parasite transfers to route2", credence 0.7) was WRONG.
 - [inf] The v1 parasite is specific to a TINY routed residual: `delta_S_hack` gets
  only the projected component `c@V`, near-noise when cin~0.1, which AdamW's m/sqrt(v)
  amplifies to ~full step size -> drift. route2's `B_q` is a distinct-basis LoRA,
  always in the forward, so it receives a FULL O(1) gradient (normal LoRA training),
  not a tiny residual. No residual to amplify -> AdamW is fine. The distinct basis is
  exactly the fix for the v1 pathology. {reason: "B_q grad = dL/dB_q through its own
  forward path, magnitude set by the loss not by a near-zero projection; this is the
  mechanism difference the reviews themselves point at", credence: 0.85}
 - [inf] So the deploy solve-jump is NOT a route2 worry to babysit. The win is
  well-defined and clean: held-out deploy-hack drops at matched deploy-solve, beating
  vanilla/erase. The single confound-check is the random-V control (#157 job 33), not
  a per-run solve-jump diagnostic. {reason: "random-V isolates directional-vs-adapter-
  deletion directly; nothing else needed", credence: 0.8}
 ## 2026-05-31 (h) — three external reviews converge: the v1 additive-route "deploy=0" is most likely an AdamW-parasite + base-recovery artifact, not localization
 **Context:** commit `670fcb3` on `probe/distill-cosine`. Reviews:
@@ -1554,7 +1554,44 @@ def main(cfg: Config) -> int:
        # than at step 0). Same PAIRS, same extract code; we just discard the
        # saved cache and overwrite the in-memory v_hack dict.
        refr = "-"  # set to "mod/axes" below if a refresh fires; rendered in the per-step row
-        if v_hack is not None and cfg.vhack_refresh_every > 0 and (step + 1) % cfg.vhack_refresh_every == 0:
+        do_refresh = cfg.vhack_refresh_every > 0 and (step + 1) % cfg.vhack_refresh_every == 0
        if do_refresh and is_route2:
            # route2 mask refresh: re-extract v_act / v_grad against the CURRENT
            # model so the mask tracks where hacks separate now, not at step 0.
            # Without this the frozen mask goes stale -- cin_t decays to cin_s
            # within ~6 steps (2026-05-31 journal, frozen-real-V route). Same
            # MASK_PAIRS (the weak detector, no oracle); quarantine ablated so the
            # hack signal flows back through the observable path, matching the
            # B_q=0 state the build-time extraction saw.
            _was_training = model.training
            model.eval()
            opt.zero_grad(set_to_none=True)
            logger.disable("projected_grpo.extract_vhack_grad")
            logger.disable("__main__")
            try:
                with ablate_quarantine(wrappers):
                    if cfg.route2_mask == "act":
                        from .extract_vhack_grad import extract_v_act
                        _v = extract_v_act(model, tok, wrappers, MASK_PAIRS, n_heldout=2, device=device)
                        for name, info in wrappers.items():
                            info["layer"]._antipasto_v_act.data.copy_(_v[name].to(device))
                    else:
                        from .extract_vhack_grad import extract_v_hack
                        _, _, raw_grads, _ = extract_v_hack(
                            model, tok, wrappers, MASK_PAIRS,
                            top_k=1, tau_axis=0.0, n_heldout=2, device=device,
                        )
                        for name in wrappers:  # update in place so _route2_grad_filter's closure sees it
                            d = (raw_grads[f"hack/{name}"] - raw_grads[f"clean/{name}"]).mean(0)
                            v_grad[name] = (d / d.norm().clamp_min(1e-12)).to(device)
            finally:
                logger.enable("projected_grpo.extract_vhack_grad")
                logger.enable("__main__")
            opt.zero_grad(set_to_none=True)  # extract leaves .grad populated
            if _was_training:
                model.train()
            refr = f"route2:{cfg.route2_mask}"
        if v_hack is not None and do_refresh:
            from .extract_vhack_grad import extract_v_hack
            if cfg.vhack_pairs_path is not None:
                from .pairs_from_pool import load_pairs_json