journal: dir6 real-V arms land (margin band) -- both suppress, per-token>per-rollout

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 19:31:11 +08:00 · 2026-06-08 02:08:02 +00:00
parent 34ba631e7d
commit cf05310130
1 changed files with 52 additions and 0 deletions
@@ -3263,3 +3263,55 @@ reason, and `cin_t > cin_s` cannot tell the two axes apart.

 No queue change now. Read the vampire (job 13) and per-rollout (job 8) frout when they
 land; build a half-solve pool only if the confound survives those.
+
+## 2026-06-08 02:10 -- dir6 real-V arms land (margin band): both suppress, per-token > per-rollout
+
+**Context:** commit `caa0d09` on `probe/distill-cosine`; pueue 8 (per-rollout) + 9 (per-token);
+logs `logs/20260607T195125_fast_routingV_seed43_dir6_routeV_s43.log` and the per-token sibling.
+Both real-V, p75/p75 margin band, dense run_tests teacher pool, seed 43, n=119 held-out test.
+
+**Hypothesis:** real-V routeV holds the cheat in the quarantine -> deploy (knob-off) hack
+well below train (knob-on) hack, at non-collapsed solve.
+
+### Observations
+
+- [obs] Deploy (knob-off, held-out test n=119):
+
+| arm | granularity | deploy hack(strict) | deploy hack(vendor eq) | solve | objective (solve-hack) |
+|---|---|---:|---:|---:|---:|
+| job 9 | per-token   | 0.042 | 0.034 | 0.143 | +0.101 |
+| job 8 | per-rollout | 0.101 | 0.084 | 0.126 | +0.025 |
+
+- [obs] Job 8 train knob-on HACK_STUDENT=0.636 (mean), val curve climbs 0.0->0.81 by step 59
+  while deploy/knob-off val (n=32) stayed ~0.00-0.03 (log VAL-eval lines). Test n=119 deploy
+  is higher (0.101) than the n=32 val (~0.03).
+- [obs] frout (per-rollout, reliable gauge): nonzero steps 2-5 (~0.29 peak), ~0.00 by step ~28;
+  rout column near 0 for the back half (the cliff, now confirmed on the reliable per-rollout gauge,
+  not just the nan-fragile per-token one).
+
+### Inferences
+
+- [inf] Per-token suppresses more than per-rollout on this seed (deploy hack 0.042 vs 0.101) at
+  equal-or-better solve. {reason: only granularity differs; both real-V same band/pool/seed;
+  credence 0.55 -- n=1 seed, the gap (0.06) is within plausible seed noise}.
+- [inf] Suppression is partial on the larger test set, not the ~0 the n=32 val implied. {reason:
+  val n=32 is small and noisy; n=119 strict hack 0.101 is the trustworthy deploy number; credence 0.8}.
+- [inf] The frout cliff is real (reliable gauge), consistent with "hack learned early -> within-group
+  GRPO advantage flattens -> little gradient left to route after ~step 10". {reason: rout~0 back-half
+  on the per-rollout gauge; train knob-on hack saturates ~0.8 by step 40; credence 0.6}.
+
+### Failure modes considered
+
+- **Most-likely:** the suppression is absorption, not direction -- the quarantine soaks up any
+  routed mass regardless of whether v_grad points at the hack. Prior 0.4. Check: job 10 random-V
+  (now Running) -- if it suppresses ~as well, H2 absorption; if it fails (deploy hack ~vanilla), H4.
+- **Subtle:** teacher-forcing/off-policy confound -- the teacher pool is ~99% hacks so "routed"
+  correlates with "off-policy", router may key on off-policy-ness not hack-content. Prior 0.3.
+  Check: job 13 vampire (in-subspace semantic placebo).
+- **Null:** deploy hack 0.10 is just the base rate, nothing was suppressed; need the emergence
+  reference. Prior 0.15. Check: job 12 vanilla -- deploy hack should be >> 0.10 by step 60.
+
+### Next action
+
+Controls running/queued: job 10 random-V (Running), 12 vanilla, 13 vampire. H2-vs-H4 verdict
+waits on 10 + 13; no queue change.