diff --git a/RESEARCH_JOURNAL.md b/RESEARCH_JOURNAL.md index 7804189..d99e91b 100644 --- a/RESEARCH_JOURNAL.md +++ b/RESEARCH_JOURNAL.md @@ -3315,3 +3315,58 @@ well below train (knob-on) hack, at non-collapsed solve. Controls running/queued: job 10 random-V (Running), 12 vanilla, 13 vampire. H2-vs-H4 verdict waits on 10 + 13; no queue change. + +## 2026-06-08 06:30 -- random-V control matches real-V at per-rollout: H2 (absorption) lead + +**Context:** commit `caa0d09` on `probe/distill-cosine`; pueue 8 (real-V) vs 10 (random-V), +both per-rollout, p75/p75 margin band, dense run_tests pool, seed 43, n=119 held-out test. +Job 10 = Haar random-V control (`--routeV-random-v-seed=157`): same routing machinery, the +quarantine direction is a random orthonormal frame instead of the extracted hack direction. + +**Hypothesis under test:** does suppression need the REAL hack direction (H4), or does the +throwaway quarantine knob absorb the hack regardless of direction (H2)? + +### Observations + +- [obs] Deploy (knob-off, held-out test n=119): + +| arm | granularity | direction | deploy hack(strict) | deploy hack(vendor eq) | solve | +|---|---|---|---:|---:|---:| +| job 8 | per-rollout | real-V | 0.101 | 0.084 | 0.126 | +| job 10 | per-rollout | random-V | 0.101 | 0.101 | 0.109 | + +- [obs] Both train knob-on hack saturate ~0.81 (val n=32, step 59); both deploy knob-off val + ~0.06. The knob holds ~0.7 of the hack in BOTH the real and random arm. +- [obs] Routing dynamics differ: real-V rout cliffs to ~0 by step ~28; random-V rout holds + ~0.27-0.31 the whole run (log per-step `rout` column). The dynamics differ but the deploy + outcome does not. + +### Inferences + +- [inf] At per-rollout granularity the suppression is absorption (H2), not direction (H4): + a random quarantine frame suppresses the hack as well as the extracted one (0.101 == 0.101 + strict). {reason: real vs random differ ONLY in the quarantine direction; identical band, + pool, seed, machinery; deploy hack matches to 3 d.p.; credence 0.6 -- n=1 seed, single + granularity}. +- [inf] The cliff-vs-no-cliff routing difference is a red herring for deploy outcome: real-V + routing collapses once the hack is learned, random-V keeps routing a fixed geometric slice, + yet both deposit ~the same hack mass in the deletable knob by step 60. {reason: rout columns + diverge but deploy hack equal; credence 0.55}. + +### Failure modes considered + +- **Most-likely:** the result is granularity-specific -- per-TOKEN real-V (job 9) suppressed + harder (deploy hack 0.042 vs per-rollout 0.101). Direction may matter at token granularity. + Prior 0.4. Check: job 11 per-token random-V (now Running) -- if it also hits ~0.04, H2 holds + at token level too; if it stays high, H4 at token granularity. +- **Subtle:** there is no suppression to attribute -- if vanilla also deploys ~0.10, the 0.101 + is just the base/emergence rate and real-vs-random is a vacuous tie. Prior 0.3. Check: job 12 + vanilla -- deploy hack should be >> 0.10 by step 60 for the comparison to mean anything. +- **Null:** 0.101 == 0.101 is seed-luck coincidence; a second seed splits them. Prior 0.2. + Check: re-run both arms at seed 41/44. + +### Next action + +No queue change. Job 11 per-token random-V (Running) is the load-bearing follow-up (controls +the better-suppressing per-token arm); job 12 vanilla confirms the target exists; job 13 vampire +is the semantic-placebo cross-check. Verdict consolidates once 11 + 12 land.