From 19544b3f06b2a7aa6cd2a605822e2bd27a4dd398 Mon Sep 17 00:00:00 2001 From: wassname Date: Tue, 2 Jun 2026 16:33:04 +0000 Subject: [PATCH] journal: route2 holds deploy-hack=0 to 200 steps (job 84, durable not delayed) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com> --- RESEARCH_JOURNAL.md | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) diff --git a/RESEARCH_JOURNAL.md b/RESEARCH_JOURNAL.md index e359ddb..831d338 100644 --- a/RESEARCH_JOURNAL.md +++ b/RESEARCH_JOURNAL.md @@ -2,6 +2,51 @@ Append-only. New entries at the top, date-stamped. Never edit old entries. +## 2026-06-02 (d) — route2 holds deploy-hack=0 out to 200 steps (durable, not delayed); vanilla-200 contrast pending + +**Context:** `probe/distill-cosine`, pueue id 84. route2 200-step convergence run, seed 41, +fast preset, Qwen3-4B, 4-mode substrate, mix=0.125, refresh-every=5, no exploration floor +(`rollout_ablate_frac=0`), deploy-eval (knob-off, n=64, T=0.7) every 10 steps. The "only delays +or actually works" test the user asked for. Artifact: +`out/runs/20260602T080804_fast_routing2_seed41_route2_converge200_s41/per_mode_deploy.json`. +Vanilla-200 saturation reference (id 85) is running now; the cross-arm contrast lands when it does. + +### Observations (DEPLOY-eval = quarantine knob OFF) + +- [obs] deploy hack = 0.000 at EVERY eval step 0..199 (steps 0/10/.../190/199). solve climbs to + ~0.625 by step 20 and holds flat to the end (final 0.609). No late-training decay. +- [obs] train-side knob-on the student keeps hacking hard the whole run: HACK_STUDENT=0.529, + per-step hack_s 13-19/28 through step 199. So this is sustained hack pressure, sustained + suppression -- not a quiet plateau where the gradient went dead. +- [obs] per-mode deploy (3 of 4 modes drawn in the n=64 sample; stdout_marker absent): + run_tests train_hack 1.00 -> deploy_hack 0.00 (deploy_solve 0.00); file_marker 0.56 -> 0.00 + (solve 0.50); sentinel 0.21 -> 0.00 (solve 0.96). deploy_hack 0 on all three. +- [obs] `hkgap` (ema_hack_cos - ema_clean_cos) drifted from positive early to ~0/slightly negative + late, even WITH refresh-every=5; `q_egy` stayed ~0.6 and `resid` ~0 throughout. + +### Interpretation + +- [inf, 0.8] route2 suppression is durable to 200 steps, not a delay. Strongest single-trajectory + evidence yet for the converge claim (artifact A4 once vanilla-200 lands). +- [inf, 0.7] the 0.61 solve plateau is NOT a knob ceiling -- it decomposes by mode. sentinel solves + clean (0.96), file_marker half (0.50), run_tests 0.00 because the model only ever learned to HACK + run_tests (train_solve 0.0), so knob-off it has no clean path. run_tests is exactly where + suppression is most load-bearing: 100% train-hack -> 0% deploy-hack. +- [inf, 0.65] routing is robust to v_grad staleness in a way one-sided erase was not: hkgap going + to ~0 (direction stops separating hack from clean) did NOT reopen deploy-hack, because the + per-rollout gate + scale-matched quarantine keep absorbing energy (q_egy~0.6) regardless. Contrast + job 64 where erase staleness leaked to 0.125. +- [caveat] n=1 seed at 200 steps; the n=3 band (entry c) was only to step 60. vanilla-200 not yet + in, so "durable WHERE vanilla saturates hacking" is still one-sided. refresh-5 not frozen, so this + does not isolate frozen-vs-refresh at the long horizon. + +### Next + +- id 85 (vanilla-200) -> build the A4 long-run overlay (route2 vs vanilla deploy-hack/solve to 200); + settle durable-vs-delayed with the contrast, append the comparison. +- per-mode shows run_tests is unsolvable-clean for this model -- note in the paper so the solve + number isn't misread as a route2 cost. + ## 2026-06-02 (c) — route2 keynote at n=3: deploy hack 0.31 -> 0.03 at HIGHER solve; StepLogger merge-bug fixed **Context:** `probe/distill-cosine`. Filling the keynote table/figure (artifacts A1/A2) from the