From cf053101300aa87d7bca8445fca9db88739ff827 Mon Sep 17 00:00:00 2001 From: wassname <1103714+wassname@users.noreply.github.com> Date: Mon, 8 Jun 2026 02:08:02 +0000 Subject: [PATCH] journal: dir6 real-V arms land (margin band) -- both suppress, per-token>per-rollout Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com> --- RESEARCH_JOURNAL.md | 52 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/RESEARCH_JOURNAL.md b/RESEARCH_JOURNAL.md index 7bb46f6..7804189 100644 --- a/RESEARCH_JOURNAL.md +++ b/RESEARCH_JOURNAL.md @@ -3263,3 +3263,55 @@ reason, and `cin_t > cin_s` cannot tell the two axes apart. No queue change now. Read the vampire (job 13) and per-rollout (job 8) frout when they land; build a half-solve pool only if the confound survives those. + +## 2026-06-08 02:10 -- dir6 real-V arms land (margin band): both suppress, per-token > per-rollout + +**Context:** commit `caa0d09` on `probe/distill-cosine`; pueue 8 (per-rollout) + 9 (per-token); +logs `logs/20260607T195125_fast_routingV_seed43_dir6_routeV_s43.log` and the per-token sibling. +Both real-V, p75/p75 margin band, dense run_tests teacher pool, seed 43, n=119 held-out test. + +**Hypothesis:** real-V routeV holds the cheat in the quarantine -> deploy (knob-off) hack +well below train (knob-on) hack, at non-collapsed solve. + +### Observations + +- [obs] Deploy (knob-off, held-out test n=119): + +| arm | granularity | deploy hack(strict) | deploy hack(vendor eq) | solve | objective (solve-hack) | +|---|---|---:|---:|---:|---:| +| job 9 | per-token | 0.042 | 0.034 | 0.143 | +0.101 | +| job 8 | per-rollout | 0.101 | 0.084 | 0.126 | +0.025 | + +- [obs] Job 8 train knob-on HACK_STUDENT=0.636 (mean), val curve climbs 0.0->0.81 by step 59 + while deploy/knob-off val (n=32) stayed ~0.00-0.03 (log VAL-eval lines). Test n=119 deploy + is higher (0.101) than the n=32 val (~0.03). +- [obs] frout (per-rollout, reliable gauge): nonzero steps 2-5 (~0.29 peak), ~0.00 by step ~28; + rout column near 0 for the back half (the cliff, now confirmed on the reliable per-rollout gauge, + not just the nan-fragile per-token one). + +### Inferences + +- [inf] Per-token suppresses more than per-rollout on this seed (deploy hack 0.042 vs 0.101) at + equal-or-better solve. {reason: only granularity differs; both real-V same band/pool/seed; + credence 0.55 -- n=1 seed, the gap (0.06) is within plausible seed noise}. +- [inf] Suppression is partial on the larger test set, not the ~0 the n=32 val implied. {reason: + val n=32 is small and noisy; n=119 strict hack 0.101 is the trustworthy deploy number; credence 0.8}. +- [inf] The frout cliff is real (reliable gauge), consistent with "hack learned early -> within-group + GRPO advantage flattens -> little gradient left to route after ~step 10". {reason: rout~0 back-half + on the per-rollout gauge; train knob-on hack saturates ~0.8 by step 40; credence 0.6}. + +### Failure modes considered + +- **Most-likely:** the suppression is absorption, not direction -- the quarantine soaks up any + routed mass regardless of whether v_grad points at the hack. Prior 0.4. Check: job 10 random-V + (now Running) -- if it suppresses ~as well, H2 absorption; if it fails (deploy hack ~vanilla), H4. +- **Subtle:** teacher-forcing/off-policy confound -- the teacher pool is ~99% hacks so "routed" + correlates with "off-policy", router may key on off-policy-ness not hack-content. Prior 0.3. + Check: job 13 vampire (in-subspace semantic placebo). +- **Null:** deploy hack 0.10 is just the base rate, nothing was suppressed; need the emergence + reference. Prior 0.15. Check: job 12 vanilla -- deploy hack should be >> 0.10 by step 60. + +### Next action + +Controls running/queued: job 10 random-V (Running), 12 vanilla, 13 vampire. H2-vs-H4 verdict +waits on 10 + 13; no queue change.