journal: random-V control matches real-V at per-rollout (0.101==0.101) -- H2 absorption lead

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 17:30:41 +08:00 · 2026-06-08 08:26:26 +00:00
parent cf05310130
commit fcac80c4bb
1 changed files with 55 additions and 0 deletions
@@ -3315,3 +3315,58 @@ well below train (knob-on) hack, at non-collapsed solve.

 Controls running/queued: job 10 random-V (Running), 12 vanilla, 13 vampire. H2-vs-H4 verdict
 waits on 10 + 13; no queue change.
+
+## 2026-06-08 06:30 -- random-V control matches real-V at per-rollout: H2 (absorption) lead
+
+**Context:** commit `caa0d09` on `probe/distill-cosine`; pueue 8 (real-V) vs 10 (random-V),
+both per-rollout, p75/p75 margin band, dense run_tests pool, seed 43, n=119 held-out test.
+Job 10 = Haar random-V control (`--routeV-random-v-seed=157`): same routing machinery, the
+quarantine direction is a random orthonormal frame instead of the extracted hack direction.
+
+**Hypothesis under test:** does suppression need the REAL hack direction (H4), or does the
+throwaway quarantine knob absorb the hack regardless of direction (H2)?
+
+### Observations
+
+- [obs] Deploy (knob-off, held-out test n=119):
+
+| arm | granularity | direction | deploy hack(strict) | deploy hack(vendor eq) | solve |
+|---|---|---|---:|---:|---:|
+| job 8  | per-rollout | real-V   | 0.101 | 0.084 | 0.126 |
+| job 10 | per-rollout | random-V | 0.101 | 0.101 | 0.109 |
+
+- [obs] Both train knob-on hack saturate ~0.81 (val n=32, step 59); both deploy knob-off val
+  ~0.06. The knob holds ~0.7 of the hack in BOTH the real and random arm.
+- [obs] Routing dynamics differ: real-V rout cliffs to ~0 by step ~28; random-V rout holds
+  ~0.27-0.31 the whole run (log per-step `rout` column). The dynamics differ but the deploy
+  outcome does not.
+
+### Inferences
+
+- [inf] At per-rollout granularity the suppression is absorption (H2), not direction (H4):
+  a random quarantine frame suppresses the hack as well as the extracted one (0.101 == 0.101
+  strict). {reason: real vs random differ ONLY in the quarantine direction; identical band,
+  pool, seed, machinery; deploy hack matches to 3 d.p.; credence 0.6 -- n=1 seed, single
+  granularity}.
+- [inf] The cliff-vs-no-cliff routing difference is a red herring for deploy outcome: real-V
+  routing collapses once the hack is learned, random-V keeps routing a fixed geometric slice,
+  yet both deposit ~the same hack mass in the deletable knob by step 60. {reason: rout columns
+  diverge but deploy hack equal; credence 0.55}.
+
+### Failure modes considered
+
+- **Most-likely:** the result is granularity-specific -- per-TOKEN real-V (job 9) suppressed
+  harder (deploy hack 0.042 vs per-rollout 0.101). Direction may matter at token granularity.
+  Prior 0.4. Check: job 11 per-token random-V (now Running) -- if it also hits ~0.04, H2 holds
+  at token level too; if it stays high, H4 at token granularity.
+- **Subtle:** there is no suppression to attribute -- if vanilla also deploys ~0.10, the 0.101
+  is just the base/emergence rate and real-vs-random is a vacuous tie. Prior 0.3. Check: job 12
+  vanilla -- deploy hack should be >> 0.10 by step 60 for the comparison to mean anything.
+- **Null:** 0.101 == 0.101 is seed-luck coincidence; a second seed splits them. Prior 0.2.
+  Check: re-run both arms at seed 41/44.
+
+### Next action
+
+No queue change. Job 11 per-token random-V (Running) is the load-bearing follow-up (controls
+the better-suppressing per-token arm); job 12 vanilla confirms the target exists; job 13 vampire
+is the semantic-placebo cross-check. Verdict consolidates once 11 + 12 land.