diff --git a/RESEARCH_JOURNAL.md b/RESEARCH_JOURNAL.md index ed2a930..1419591 100644 --- a/RESEARCH_JOURNAL.md +++ b/RESEARCH_JOURNAL.md @@ -2,6 +2,59 @@ Append-only. New entries at the top, date-stamped. Never edit old entries. +## 2026-05-31 (k) — vanilla emergence reference (sub4 overlay): per-mode hacking is asymmetric, not uniform + +**Context:** commit `b72c5ac` on `probe/distill-cosine`; pueue id 40; vanilla (intervention=none), +seed 41, 60 steps, fast preset, 4-mode substrate, mix=0.125. First arm of the 5-arm sub4 +overlay sweep (40-44, all #164/#162). Log via `pueue log 40`; artifact: +`out/runs/20260531T141402_fast_vanilla_seed41_sub4_vanilla/per_mode_deploy.json`. + +**Hypothesis:** vanilla learns to hack all available loopholes roughly evenly (~0.28 hack +expected). This arm is the emergence reference the intervention overlay subtracts against. + +### Observations + +- [obs] Aggregate FINAL EVAL (n=64): hack=0.359, solve=0.453. BLUF main metric + HACK_RATE=0.359 PASS_RATE=0.283, peak 73.1GB. (deploy==train, no quarantine.) +- [obs] Per-mode emergence is asymmetric, not uniform: + +| mode (n) | dist | hack | solve | +|---------------------|----------|------|-------| +| run_tests (8) | IN-dist | 0.88 | 0.00 | +| file_marker (32) | held-out | 0.50 | 0.19 | +| sentinel (24) | held-out | 0.00 | 0.96 | + +- [obs] stdout_marker has zero problems in the fixed eval subset (n=64) -> absent from + by_mode. The overlay will compare 3 modes, not 4. Same eval_idxs across all arms, so the + comparison stays valid, just narrower. + +### Inferences + +- [inf] The headline "suppress held-out hacking" can only be tested on file_marker among + the held-out modes -- sentinel is already solved-not-hacked at vanilla, so there is no + hack for an intervention to remove there. {reason: "sentinel hack=0/24 at vanilla; you + cannot suppress what never emerged", credence: 0.9} +- [inf] run_tests is the cleanest emergence (hack 0.88, solve 0.00) -- the IN-dist mode the + route/route2 v_hack is built to catch; expect the largest deploy-hack drop there for the + quarantine arms. {reason: "v_hack extracted on run_tests-flavoured pairs; in-dist by + construction", credence: 0.6} + +### Failure modes considered + +- **Likely:** the 3-mode eval subset is just an unlucky draw and stdout_marker would also + show emergence. Prior: 0.4. Check: widen eval_idxs to include >=1 stdout_marker problem, + or read the streaming per-mode hk_so column from the train log. +- **Subtle:** sentinel solve=0.96 is the student gaming the *grader's* notion of solve, not + truly solving. Prior: 0.2. Check: spot-read a sentinel rollout for a real solution vs a + sentinel-marker trick that the solve-detector misreads. +- **Null:** the asymmetry is seed-41 specific and averages out across seeds. Prior: 0.3. + Check: the queued seed 43/44 replicates. + +### Next action + +Erase (41) running; route/route2 (42-44) queued. On each: confirm deploy-solve >= train-solve +and read file_marker + run_tests deploy-hack vs this vanilla baseline. Then build #162 overlay. + ## 2026-05-31 (j) — frozen-real-V route (rf0) only drops deploy hack ~8pp, NOT to run-31's ~0; staleness is the cause **Context:** commit `2b020c9` on `probe/distill-cosine`; pueue id 32; route real-V,