From 19544b3f06b2a7aa6cd2a605822e2bd27a4dd398 Mon Sep 17 00:00:00 2001
From: wassname <github@wassname>
Date: Tue, 2 Jun 2026 16:33:04 +0000
Subject: [PATCH] journal: route2 holds deploy-hack=0 to 200 steps (job 84,
 durable not delayed)

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
---
 RESEARCH_JOURNAL.md | 45 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/RESEARCH_JOURNAL.md b/RESEARCH_JOURNAL.md
index e359ddb..831d338 100644
--- a/RESEARCH_JOURNAL.md
+++ b/RESEARCH_JOURNAL.md
@@ -2,6 +2,51 @@
 
 Append-only. New entries at the top, date-stamped. Never edit old entries.
 
+## 2026-06-02 (d) — route2 holds deploy-hack=0 out to 200 steps (durable, not delayed); vanilla-200 contrast pending
+
+**Context:** `probe/distill-cosine`, pueue id 84. route2 200-step convergence run, seed 41,
+fast preset, Qwen3-4B, 4-mode substrate, mix=0.125, refresh-every=5, no exploration floor
+(`rollout_ablate_frac=0`), deploy-eval (knob-off, n=64, T=0.7) every 10 steps. The "only delays
+or actually works" test the user asked for. Artifact:
+`out/runs/20260602T080804_fast_routing2_seed41_route2_converge200_s41/per_mode_deploy.json`.
+Vanilla-200 saturation reference (id 85) is running now; the cross-arm contrast lands when it does.
+
+### Observations (DEPLOY-eval = quarantine knob OFF)
+
+- [obs] deploy hack = 0.000 at EVERY eval step 0..199 (steps 0/10/.../190/199). solve climbs to
+  ~0.625 by step 20 and holds flat to the end (final 0.609). No late-training decay.
+- [obs] train-side knob-on the student keeps hacking hard the whole run: HACK_STUDENT=0.529,
+  per-step hack_s 13-19/28 through step 199. So this is sustained hack pressure, sustained
+  suppression -- not a quiet plateau where the gradient went dead.
+- [obs] per-mode deploy (3 of 4 modes drawn in the n=64 sample; stdout_marker absent):
+  run_tests train_hack 1.00 -> deploy_hack 0.00 (deploy_solve 0.00); file_marker 0.56 -> 0.00
+  (solve 0.50); sentinel 0.21 -> 0.00 (solve 0.96). deploy_hack 0 on all three.
+- [obs] `hkgap` (ema_hack_cos - ema_clean_cos) drifted from positive early to ~0/slightly negative
+  late, even WITH refresh-every=5; `q_egy` stayed ~0.6 and `resid` ~0 throughout.
+
+### Interpretation
+
+- [inf, 0.8] route2 suppression is durable to 200 steps, not a delay. Strongest single-trajectory
+  evidence yet for the converge claim (artifact A4 once vanilla-200 lands).
+- [inf, 0.7] the 0.61 solve plateau is NOT a knob ceiling -- it decomposes by mode. sentinel solves
+  clean (0.96), file_marker half (0.50), run_tests 0.00 because the model only ever learned to HACK
+  run_tests (train_solve 0.0), so knob-off it has no clean path. run_tests is exactly where
+  suppression is most load-bearing: 100% train-hack -> 0% deploy-hack.
+- [inf, 0.65] routing is robust to v_grad staleness in a way one-sided erase was not: hkgap going
+  to ~0 (direction stops separating hack from clean) did NOT reopen deploy-hack, because the
+  per-rollout gate + scale-matched quarantine keep absorbing energy (q_egy~0.6) regardless. Contrast
+  job 64 where erase staleness leaked to 0.125.
+- [caveat] n=1 seed at 200 steps; the n=3 band (entry c) was only to step 60. vanilla-200 not yet
+  in, so "durable WHERE vanilla saturates hacking" is still one-sided. refresh-5 not frozen, so this
+  does not isolate frozen-vs-refresh at the long horizon.
+
+### Next
+
+- id 85 (vanilla-200) -> build the A4 long-run overlay (route2 vs vanilla deploy-hack/solve to 200);
+  settle durable-vs-delayed with the contrast, append the comparison.
+- per-mode shows run_tests is unsolvable-clean for this model -- note in the paper so the solve
+  number isn't misread as a route2 cost.
+
 ## 2026-06-02 (c) — route2 keynote at n=3: deploy hack 0.31 -> 0.03 at HIGHER solve; StepLogger merge-bug fixed
 
 **Context:** `probe/distill-cosine`. Filling the keynote table/figure (artifacts A1/A2) from the