mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 17:30:41 +08:00
fix: route2 refresh was a silent no-op (vhack_refresh_every gated on v_hack, which route2 sets None)
route2 uses v_act/v_grad, not v_hack, so --vhack-refresh-every never fired for it -- the mask was frozen regardless of the flag. Frozen real-V route (job 32) shows why this matters: cin_t decays to cin_s by step 7, deploy hack only drops ~8pp (vs run-31 rf5 ~0). Now re-extracts v_act/v_grad every N steps with the quarantine ablated (same MASK_PAIRS, no oracle). + journal entry (j). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -2,6 +2,78 @@
|
|||||||
|
|
||||||
Append-only. New entries at the top, date-stamped. Never edit old entries.
|
Append-only. New entries at the top, date-stamped. Never edit old entries.
|
||||||
|
|
||||||
|
## 2026-05-31 (j) — frozen-real-V route (rf0) only drops deploy hack ~8pp, NOT to run-31's ~0; staleness is the cause
|
||||||
|
|
||||||
|
**Context:** commit `2b020c9` on `probe/distill-cosine`; pueue id 32; route real-V,
|
||||||
|
refresh OFF, seed 41, 60 steps. The rf0 anchor for the random-V comparison (#157).
|
||||||
|
Log: `logs/20260531T095110_fast_routing_seed41_sub4_route_rf0_REAL_s41.log`.
|
||||||
|
|
||||||
|
**Hypothesis:** frozen route reproduces run-31's deploy hack-drop; random-V (job 33)
|
||||||
|
then tells us if the drop is directional or just adapter-ablation.
|
||||||
|
|
||||||
|
### Observations
|
||||||
|
|
||||||
|
- [obs] ROUTE EVAL n=64: train/knob-on hack=0.375 solve=0.422 | deploy/knob-off
|
||||||
|
hack=0.297 solve=0.500. Deploy hack drops only ~8pp from train, NOT to ~0.
|
||||||
|
- [obs] Per-mode (train->deploy hack): file_marker 15/32->13/32 (held-out),
|
||||||
|
run_tests 7/8->6/8 (in-dist), sentinel 2/24->0/24 (held-out, already low).
|
||||||
|
- [obs] cin_t > cin_s only for steps 2-6 (0.32 vs 0.15 at step 2), then cin_t
|
||||||
|
decays to meet cin_s (~0.13) and is <= cin_s for the rest. The decay is
|
||||||
|
teacher-side: cin_s stays flat ~0.13-0.18, cin_t falls.
|
||||||
|
- [obs] cout=0.00 every step (routing mechanically removes the hack-ward component).
|
||||||
|
- [obs] hack_s climbs 0 -> ~20/28 (71%); route does not suppress live training hack
|
||||||
|
(expected; suppression is deploy-only).
|
||||||
|
|
||||||
|
### Inferences
|
||||||
|
|
||||||
|
- [inf] The weak frozen-route deploy drop is caused by basis staleness: with v_hack
|
||||||
|
frozen and delta_S moving, the cached teacher gradient stops looking hack-ward in
|
||||||
|
the stale basis after ~6 steps (cin_t collapse), so the quarantine stops absorbing
|
||||||
|
the cheat. Refresh keeps cin_t high -> more absorption -> lower deploy hack. This
|
||||||
|
is why run-31 (rf5) reached ~0 and rf0 does not. {reason: "cin_t decay timing
|
||||||
|
matches the absorption gap; cout=0 shows the routing op works, so the failure is in
|
||||||
|
WHAT the stale basis points at, not the mechanics", credence: 0.6}
|
||||||
|
- [inf] Directly motivates refreshing the mask every 5 steps for the vector-based
|
||||||
|
arms (route2 act/grad). Found+fixed a related silent no-op: route2 set v_hack=None
|
||||||
|
so `--vhack-refresh-every` never fired for it; v_act/v_grad were frozen regardless
|
||||||
|
of the flag. Now re-extracted every N steps with the quarantine ablated. {reason:
|
||||||
|
"code-read of the refresh guard `if v_hack is not None`", credence: 0.95}
|
||||||
|
|
||||||
|
### Failure modes considered
|
||||||
|
|
||||||
|
- **Likely:** the 8pp drop is adapter-ablation regularization, not directional. Prior
|
||||||
|
0.4. Check: job 33 random-V rf0 — if it also drops ~8pp, the direction isn't
|
||||||
|
load-bearing at rf0.
|
||||||
|
- **Subtle:** deploy eval n=64 is small; 8pp (and per-mode 15->13, 7/8->6/8) within
|
||||||
|
binomial noise. Prior 0.3. Check: n>=2 seeds before trusting the gap.
|
||||||
|
- **Null:** run-31's "~0" was a different config (rf5 + per-mode decompose job 31),
|
||||||
|
not directly comparable to rf0; the contrast may be partly cross-run artifact.
|
||||||
|
Prior 0.2. Check: rerun route rf5 with this exact harness/eval, compare head-to-head.
|
||||||
|
|
||||||
|
### Next action
|
||||||
|
|
||||||
|
Job 33 (random-V rf0) running -> directional-vs-ablation check. Route2 act/grad
|
||||||
|
(34/35) now actually refresh v_act/v_grad every 5 (fix committed). Compare frozen-real
|
||||||
|
(32) vs random (33) deploy hack when 33 lands.
|
||||||
|
|
||||||
|
## 2026-05-31 (i) — CORRECTION to (h): the AdamW-parasite does NOT transfer to route2
|
||||||
|
|
||||||
|
Entry (h) inf #2 ("Adam-parasite transfers to route2", credence 0.7) was WRONG.
|
||||||
|
|
||||||
|
- [inf] The v1 parasite is specific to a TINY routed residual: `delta_S_hack` gets
|
||||||
|
only the projected component `c@V`, near-noise when cin~0.1, which AdamW's m/sqrt(v)
|
||||||
|
amplifies to ~full step size -> drift. route2's `B_q` is a distinct-basis LoRA,
|
||||||
|
always in the forward, so it receives a FULL O(1) gradient (normal LoRA training),
|
||||||
|
not a tiny residual. No residual to amplify -> AdamW is fine. The distinct basis is
|
||||||
|
exactly the fix for the v1 pathology. {reason: "B_q grad = dL/dB_q through its own
|
||||||
|
forward path, magnitude set by the loss not by a near-zero projection; this is the
|
||||||
|
mechanism difference the reviews themselves point at", credence: 0.85}
|
||||||
|
- [inf] So the deploy solve-jump is NOT a route2 worry to babysit. The win is
|
||||||
|
well-defined and clean: held-out deploy-hack drops at matched deploy-solve, beating
|
||||||
|
vanilla/erase. The single confound-check is the random-V control (#157 job 33), not
|
||||||
|
a per-run solve-jump diagnostic. {reason: "random-V isolates directional-vs-adapter-
|
||||||
|
deletion directly; nothing else needed", credence: 0.8}
|
||||||
|
|
||||||
## 2026-05-31 (h) — three external reviews converge: the v1 additive-route "deploy=0" is most likely an AdamW-parasite + base-recovery artifact, not localization
|
## 2026-05-31 (h) — three external reviews converge: the v1 additive-route "deploy=0" is most likely an AdamW-parasite + base-recovery artifact, not localization
|
||||||
|
|
||||||
**Context:** commit `670fcb3` on `probe/distill-cosine`. Reviews:
|
**Context:** commit `670fcb3` on `probe/distill-cosine`. Reviews:
|
||||||
|
|||||||
@@ -1554,7 +1554,44 @@ def main(cfg: Config) -> int:
|
|||||||
# than at step 0). Same PAIRS, same extract code; we just discard the
|
# than at step 0). Same PAIRS, same extract code; we just discard the
|
||||||
# saved cache and overwrite the in-memory v_hack dict.
|
# saved cache and overwrite the in-memory v_hack dict.
|
||||||
refr = "-" # set to "mod/axes" below if a refresh fires; rendered in the per-step row
|
refr = "-" # set to "mod/axes" below if a refresh fires; rendered in the per-step row
|
||||||
if v_hack is not None and cfg.vhack_refresh_every > 0 and (step + 1) % cfg.vhack_refresh_every == 0:
|
do_refresh = cfg.vhack_refresh_every > 0 and (step + 1) % cfg.vhack_refresh_every == 0
|
||||||
|
if do_refresh and is_route2:
|
||||||
|
# route2 mask refresh: re-extract v_act / v_grad against the CURRENT
|
||||||
|
# model so the mask tracks where hacks separate now, not at step 0.
|
||||||
|
# Without this the frozen mask goes stale -- cin_t decays to cin_s
|
||||||
|
# within ~6 steps (2026-05-31 journal, frozen-real-V route). Same
|
||||||
|
# MASK_PAIRS (the weak detector, no oracle); quarantine ablated so the
|
||||||
|
# hack signal flows back through the observable path, matching the
|
||||||
|
# B_q=0 state the build-time extraction saw.
|
||||||
|
_was_training = model.training
|
||||||
|
model.eval()
|
||||||
|
opt.zero_grad(set_to_none=True)
|
||||||
|
logger.disable("projected_grpo.extract_vhack_grad")
|
||||||
|
logger.disable("__main__")
|
||||||
|
try:
|
||||||
|
with ablate_quarantine(wrappers):
|
||||||
|
if cfg.route2_mask == "act":
|
||||||
|
from .extract_vhack_grad import extract_v_act
|
||||||
|
_v = extract_v_act(model, tok, wrappers, MASK_PAIRS, n_heldout=2, device=device)
|
||||||
|
for name, info in wrappers.items():
|
||||||
|
info["layer"]._antipasto_v_act.data.copy_(_v[name].to(device))
|
||||||
|
else:
|
||||||
|
from .extract_vhack_grad import extract_v_hack
|
||||||
|
_, _, raw_grads, _ = extract_v_hack(
|
||||||
|
model, tok, wrappers, MASK_PAIRS,
|
||||||
|
top_k=1, tau_axis=0.0, n_heldout=2, device=device,
|
||||||
|
)
|
||||||
|
for name in wrappers: # update in place so _route2_grad_filter's closure sees it
|
||||||
|
d = (raw_grads[f"hack/{name}"] - raw_grads[f"clean/{name}"]).mean(0)
|
||||||
|
v_grad[name] = (d / d.norm().clamp_min(1e-12)).to(device)
|
||||||
|
finally:
|
||||||
|
logger.enable("projected_grpo.extract_vhack_grad")
|
||||||
|
logger.enable("__main__")
|
||||||
|
opt.zero_grad(set_to_none=True) # extract leaves .grad populated
|
||||||
|
if _was_training:
|
||||||
|
model.train()
|
||||||
|
refr = f"route2:{cfg.route2_mask}"
|
||||||
|
if v_hack is not None and do_refresh:
|
||||||
from .extract_vhack_grad import extract_v_hack
|
from .extract_vhack_grad import extract_v_hack
|
||||||
if cfg.vhack_pairs_path is not None:
|
if cfg.vhack_pairs_path is not None:
|
||||||
from .pairs_from_pool import load_pairs_json
|
from .pairs_from_pool import load_pairs_json
|
||||||
|
|||||||
Reference in New Issue
Block a user