journal: erase arm sub4 (-7.8pp hack/+4.7pp solve, win on file_marker)

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-05-31 18:56:09 +00:00
parent b0432af1e8
commit 47abce4307
+55
View File
@@ -2,6 +2,61 @@
Append-only. New entries at the top, date-stamped. Never edit old entries.
## 2026-05-31 (l) — erase (one-sided projection) vs vanilla: -7.8pp hack / +4.7pp solve, but the win is on held-out file_marker not in-dist run_tests
**Context:** commit `b0432af` on `probe/distill-cosine`; pueue id 41; projected/erase
(intervention=erase, refresh-5 default), seed 41, 60 steps, fast preset, 4-mode substrate,
mix=0.125. Second arm of the sub4 overlay sweep. Pairs against vanilla entry (k) above.
Artifact: `out/runs/20260531T163110_fast_projected_seed41_sub4_erase/per_mode_deploy.json`.
**Hypothesis:** erase suppresses hack below vanilla (0.359) at matched solve, with the
largest drop on the in-dist mode the v_hack was built closest to (run_tests).
### Observations
- [obs] cout=0.00 throughout (e.g. step 37) -> projection mechanically fires, removes all
hack-ward grad. Not a silent no-op.
- [obs] cin_t (0.09) < cin_s (0.15) by mid-training (step 37) -- the REVERSE of the wanted
cin_t>cin_s. Consistent with v_hack staleness (entry j).
- [obs] Aggregate FINAL EVAL (n=64, deploy==train): hack=0.281 solve=0.500, vs vanilla
hack=0.359 solve=0.453. So -7.8pp hack, +4.7pp solve. BLUF HACK_RATE=0.338 PASS_RATE=0.291.
- [obs] Per-mode erase vs vanilla:
| mode (n) | dist | vanilla hack/solve | erase hack/solve |
|-------------------|----------|--------------------|------------------|
| run_tests (8) | IN-dist | 7/8 / 0/8 | 8/8 / 0/8 |
| file_marker (32) | held-out | 16/32 / 6/32 | 10/32 / 13/32 |
| sentinel (24) | held-out | 0/24 / 23/24 | 0/24 / 19/24 |
### Inferences
- [inf] Erase's entire net win is concentrated in file_marker (hack 50%->31%, solve
19%->41%); run_tests stays saturated and sentinel was never hacked. So the aggregate
-7.8pp/+4.7pp is really "erase rescued file_marker". {reason: "the other two modes are
unchanged within noise; file_marker is the only mode that moved", credence: 0.85}
- [inf] The win landing on held-out file_marker rather than in-dist run_tests is mildly
counterintuitive but NOT evidence against the method -- run_tests is already saturated at
vanilla (7/8) so there is little hack-rate headroom to recover there, whereas file_marker
at 50% has room to move. {reason: "ceiling effect on run_tests; headroom on file_marker",
credence: 0.6}
- [inf] -7.8pp is far short of the preregistered 30pp (H1). Consistent with prior n=1
erase results. {reason: "matches the G0 21-pair erase magnitude band", credence: 0.7}
### Failure modes considered
- **Likely:** run_tests n=8 is too small -- the 7/8 vs 8/8 "no suppression" is one rollout,
pure noise; erase may help run_tests too at larger n. Prior: 0.4. Check: read the
streaming hk_rt cumulative column, or widen the eval subset.
- **Subtle:** file_marker solve 6->13 is the solve-detector being fooled by a file_marker
artifact, not real solving. Prior: 0.2. Check: spot-read a file_marker "solve" rollout.
- **Null:** the -7.8pp/+4.7pp is seed-41 run-to-run variance, not the erase intervention.
Prior: 0.3. Check: seed 43/44 replicates (queued after the sweep).
### Next action
Route (42) running, route2 (43-44) queued. The deploy-solve>=train-solve KEY CHECK only
becomes testable on the quarantine arms (42-44, deploy!=train). Then build #162 overlay.
## 2026-05-31 (k) — vanilla emergence reference (sub4 overlay): per-mode hacking is asymmetric, not uniform
**Context:** commit `b72c5ac` on `probe/distill-cosine`; pueue id 40; vanilla (intervention=none),