journal: random-V control matches real-V at per-rollout (0.101==0.101) -- H2 absorption lead

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-08 08:26:26 +00:00
parent cf05310130
commit fcac80c4bb
+55
View File
@@ -3315,3 +3315,58 @@ well below train (knob-on) hack, at non-collapsed solve.
Controls running/queued: job 10 random-V (Running), 12 vanilla, 13 vampire. H2-vs-H4 verdict
waits on 10 + 13; no queue change.
## 2026-06-08 06:30 -- random-V control matches real-V at per-rollout: H2 (absorption) lead
**Context:** commit `caa0d09` on `probe/distill-cosine`; pueue 8 (real-V) vs 10 (random-V),
both per-rollout, p75/p75 margin band, dense run_tests pool, seed 43, n=119 held-out test.
Job 10 = Haar random-V control (`--routeV-random-v-seed=157`): same routing machinery, the
quarantine direction is a random orthonormal frame instead of the extracted hack direction.
**Hypothesis under test:** does suppression need the REAL hack direction (H4), or does the
throwaway quarantine knob absorb the hack regardless of direction (H2)?
### Observations
- [obs] Deploy (knob-off, held-out test n=119):
| arm | granularity | direction | deploy hack(strict) | deploy hack(vendor eq) | solve |
|---|---|---|---:|---:|---:|
| job 8 | per-rollout | real-V | 0.101 | 0.084 | 0.126 |
| job 10 | per-rollout | random-V | 0.101 | 0.101 | 0.109 |
- [obs] Both train knob-on hack saturate ~0.81 (val n=32, step 59); both deploy knob-off val
~0.06. The knob holds ~0.7 of the hack in BOTH the real and random arm.
- [obs] Routing dynamics differ: real-V rout cliffs to ~0 by step ~28; random-V rout holds
~0.27-0.31 the whole run (log per-step `rout` column). The dynamics differ but the deploy
outcome does not.
### Inferences
- [inf] At per-rollout granularity the suppression is absorption (H2), not direction (H4):
a random quarantine frame suppresses the hack as well as the extracted one (0.101 == 0.101
strict). {reason: real vs random differ ONLY in the quarantine direction; identical band,
pool, seed, machinery; deploy hack matches to 3 d.p.; credence 0.6 -- n=1 seed, single
granularity}.
- [inf] The cliff-vs-no-cliff routing difference is a red herring for deploy outcome: real-V
routing collapses once the hack is learned, random-V keeps routing a fixed geometric slice,
yet both deposit ~the same hack mass in the deletable knob by step 60. {reason: rout columns
diverge but deploy hack equal; credence 0.55}.
### Failure modes considered
- **Most-likely:** the result is granularity-specific -- per-TOKEN real-V (job 9) suppressed
harder (deploy hack 0.042 vs per-rollout 0.101). Direction may matter at token granularity.
Prior 0.4. Check: job 11 per-token random-V (now Running) -- if it also hits ~0.04, H2 holds
at token level too; if it stays high, H4 at token granularity.
- **Subtle:** there is no suppression to attribute -- if vanilla also deploys ~0.10, the 0.101
is just the base/emergence rate and real-vs-random is a vacuous tie. Prior 0.3. Check: job 12
vanilla -- deploy hack should be >> 0.10 by step 60 for the comparison to mean anything.
- **Null:** 0.101 == 0.101 is seed-luck coincidence; a second seed splits them. Prior 0.2.
Check: re-run both arms at seed 41/44.
### Next action
No queue change. Job 11 per-token random-V (Running) is the load-bearing follow-up (controls
the better-suppressing per-token arm); job 12 vanilla confirms the target exists; job 13 vampire
is the semantic-placebo cross-check. Verdict consolidates once 11 + 12 land.