From 25569193c5cac0b2226c51cb5682757f564900f0 Mon Sep 17 00:00:00 2001 From: wassname Date: Sun, 31 May 2026 12:19:38 +0000 Subject: [PATCH] fix: route2 refresh was a silent no-op (vhack_refresh_every gated on v_hack, which route2 sets None) route2 uses v_act/v_grad, not v_hack, so --vhack-refresh-every never fired for it -- the mask was frozen regardless of the flag. Frozen real-V route (job 32) shows why this matters: cin_t decays to cin_s by step 7, deploy hack only drops ~8pp (vs run-31 rf5 ~0). Now re-extracts v_act/v_grad every N steps with the quarantine ablated (same MASK_PAIRS, no oracle). + journal entry (j). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com> --- RESEARCH_JOURNAL.md | 72 +++++++++++++++++++++++++++++++++++++ src/projected_grpo/train.py | 39 +++++++++++++++++++- 2 files changed, 110 insertions(+), 1 deletion(-) diff --git a/RESEARCH_JOURNAL.md b/RESEARCH_JOURNAL.md index 058700b..48ae4a0 100644 --- a/RESEARCH_JOURNAL.md +++ b/RESEARCH_JOURNAL.md @@ -2,6 +2,78 @@ Append-only. New entries at the top, date-stamped. Never edit old entries. +## 2026-05-31 (j) — frozen-real-V route (rf0) only drops deploy hack ~8pp, NOT to run-31's ~0; staleness is the cause + +**Context:** commit `2b020c9` on `probe/distill-cosine`; pueue id 32; route real-V, +refresh OFF, seed 41, 60 steps. The rf0 anchor for the random-V comparison (#157). +Log: `logs/20260531T095110_fast_routing_seed41_sub4_route_rf0_REAL_s41.log`. + +**Hypothesis:** frozen route reproduces run-31's deploy hack-drop; random-V (job 33) +then tells us if the drop is directional or just adapter-ablation. + +### Observations + +- [obs] ROUTE EVAL n=64: train/knob-on hack=0.375 solve=0.422 | deploy/knob-off + hack=0.297 solve=0.500. Deploy hack drops only ~8pp from train, NOT to ~0. +- [obs] Per-mode (train->deploy hack): file_marker 15/32->13/32 (held-out), + run_tests 7/8->6/8 (in-dist), sentinel 2/24->0/24 (held-out, already low). +- [obs] cin_t > cin_s only for steps 2-6 (0.32 vs 0.15 at step 2), then cin_t + decays to meet cin_s (~0.13) and is <= cin_s for the rest. The decay is + teacher-side: cin_s stays flat ~0.13-0.18, cin_t falls. +- [obs] cout=0.00 every step (routing mechanically removes the hack-ward component). +- [obs] hack_s climbs 0 -> ~20/28 (71%); route does not suppress live training hack + (expected; suppression is deploy-only). + +### Inferences + +- [inf] The weak frozen-route deploy drop is caused by basis staleness: with v_hack + frozen and delta_S moving, the cached teacher gradient stops looking hack-ward in + the stale basis after ~6 steps (cin_t collapse), so the quarantine stops absorbing + the cheat. Refresh keeps cin_t high -> more absorption -> lower deploy hack. This + is why run-31 (rf5) reached ~0 and rf0 does not. {reason: "cin_t decay timing + matches the absorption gap; cout=0 shows the routing op works, so the failure is in + WHAT the stale basis points at, not the mechanics", credence: 0.6} +- [inf] Directly motivates refreshing the mask every 5 steps for the vector-based + arms (route2 act/grad). Found+fixed a related silent no-op: route2 set v_hack=None + so `--vhack-refresh-every` never fired for it; v_act/v_grad were frozen regardless + of the flag. Now re-extracted every N steps with the quarantine ablated. {reason: + "code-read of the refresh guard `if v_hack is not None`", credence: 0.95} + +### Failure modes considered + +- **Likely:** the 8pp drop is adapter-ablation regularization, not directional. Prior + 0.4. Check: job 33 random-V rf0 — if it also drops ~8pp, the direction isn't + load-bearing at rf0. +- **Subtle:** deploy eval n=64 is small; 8pp (and per-mode 15->13, 7/8->6/8) within + binomial noise. Prior 0.3. Check: n>=2 seeds before trusting the gap. +- **Null:** run-31's "~0" was a different config (rf5 + per-mode decompose job 31), + not directly comparable to rf0; the contrast may be partly cross-run artifact. + Prior 0.2. Check: rerun route rf5 with this exact harness/eval, compare head-to-head. + +### Next action + +Job 33 (random-V rf0) running -> directional-vs-ablation check. Route2 act/grad +(34/35) now actually refresh v_act/v_grad every 5 (fix committed). Compare frozen-real +(32) vs random (33) deploy hack when 33 lands. + +## 2026-05-31 (i) — CORRECTION to (h): the AdamW-parasite does NOT transfer to route2 + +Entry (h) inf #2 ("Adam-parasite transfers to route2", credence 0.7) was WRONG. + +- [inf] The v1 parasite is specific to a TINY routed residual: `delta_S_hack` gets + only the projected component `c@V`, near-noise when cin~0.1, which AdamW's m/sqrt(v) + amplifies to ~full step size -> drift. route2's `B_q` is a distinct-basis LoRA, + always in the forward, so it receives a FULL O(1) gradient (normal LoRA training), + not a tiny residual. No residual to amplify -> AdamW is fine. The distinct basis is + exactly the fix for the v1 pathology. {reason: "B_q grad = dL/dB_q through its own + forward path, magnitude set by the loss not by a near-zero projection; this is the + mechanism difference the reviews themselves point at", credence: 0.85} +- [inf] So the deploy solve-jump is NOT a route2 worry to babysit. The win is + well-defined and clean: held-out deploy-hack drops at matched deploy-solve, beating + vanilla/erase. The single confound-check is the random-V control (#157 job 33), not + a per-run solve-jump diagnostic. {reason: "random-V isolates directional-vs-adapter- + deletion directly; nothing else needed", credence: 0.8} + ## 2026-05-31 (h) — three external reviews converge: the v1 additive-route "deploy=0" is most likely an AdamW-parasite + base-recovery artifact, not localization **Context:** commit `670fcb3` on `probe/distill-cosine`. Reviews: diff --git a/src/projected_grpo/train.py b/src/projected_grpo/train.py index 485b146..615d627 100644 --- a/src/projected_grpo/train.py +++ b/src/projected_grpo/train.py @@ -1554,7 +1554,44 @@ def main(cfg: Config) -> int: # than at step 0). Same PAIRS, same extract code; we just discard the # saved cache and overwrite the in-memory v_hack dict. refr = "-" # set to "mod/axes" below if a refresh fires; rendered in the per-step row - if v_hack is not None and cfg.vhack_refresh_every > 0 and (step + 1) % cfg.vhack_refresh_every == 0: + do_refresh = cfg.vhack_refresh_every > 0 and (step + 1) % cfg.vhack_refresh_every == 0 + if do_refresh and is_route2: + # route2 mask refresh: re-extract v_act / v_grad against the CURRENT + # model so the mask tracks where hacks separate now, not at step 0. + # Without this the frozen mask goes stale -- cin_t decays to cin_s + # within ~6 steps (2026-05-31 journal, frozen-real-V route). Same + # MASK_PAIRS (the weak detector, no oracle); quarantine ablated so the + # hack signal flows back through the observable path, matching the + # B_q=0 state the build-time extraction saw. + _was_training = model.training + model.eval() + opt.zero_grad(set_to_none=True) + logger.disable("projected_grpo.extract_vhack_grad") + logger.disable("__main__") + try: + with ablate_quarantine(wrappers): + if cfg.route2_mask == "act": + from .extract_vhack_grad import extract_v_act + _v = extract_v_act(model, tok, wrappers, MASK_PAIRS, n_heldout=2, device=device) + for name, info in wrappers.items(): + info["layer"]._antipasto_v_act.data.copy_(_v[name].to(device)) + else: + from .extract_vhack_grad import extract_v_hack + _, _, raw_grads, _ = extract_v_hack( + model, tok, wrappers, MASK_PAIRS, + top_k=1, tau_axis=0.0, n_heldout=2, device=device, + ) + for name in wrappers: # update in place so _route2_grad_filter's closure sees it + d = (raw_grads[f"hack/{name}"] - raw_grads[f"clean/{name}"]).mean(0) + v_grad[name] = (d / d.norm().clamp_min(1e-12)).to(device) + finally: + logger.enable("projected_grpo.extract_vhack_grad") + logger.enable("__main__") + opt.zero_grad(set_to_none=True) # extract leaves .grad populated + if _was_training: + model.train() + refr = f"route2:{cfg.route2_mask}" + if v_hack is not None and do_refresh: from .extract_vhack_grad import extract_v_hack if cfg.vhack_pairs_path is not None: from .pairs_from_pool import load_pairs_json