results: A3 erase-online row lands (deploy hack 0.562 > vanilla; route 0.000)

Job 76 (erase online refresh-5 s41) finished: deploy hack 0.562 / solve 0.438.
One-sided gradient erasure ends ABOVE vanilla (0.359) at deploy -- it does not
suppress hacking, while route zeroes it. cos_post pinned 0 each step (we did
remove the aligned component) yet hack still emerged, so the hack signal lives
largely off the extracted axis under erase. Filled tab:ablation vanilla(77)+
erase-online(76) rows, corrected stale job-id mapping (96/86/87/88 after requeue).
Journal 2026-06-03(b).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-03 06:47:58 +00:00
parent 1fb49a3325
commit 8d16b317cb
2 changed files with 47 additions and 11 deletions
+33
View File
@@ -2,6 +2,39 @@
Append-only. New entries at the top, date-stamped. Never edit old entries.
## 2026-06-03 (b) — A3 ablation: erase fails, route succeeds (erase-online row lands)
**Context:** Job 76 (erase online, refresh-5, s41, 60-step fast) landed. This is the
A3 negative-control row: one-sided gradient erasure that re-extracts v_hack every 5 steps.
tab:ablation in docs/writeup/main.tex filled (#183 partial; erase-static job 96 still running).
### Observations (DEPLOY-eval, knob off, seed 41, n=64 T=0.7)
- [obs] erase online rf5: deploy hack 0.562 / solve 0.438 (HACK_S 0.504, PASS 0.291). Run
20260603T032141. Hack climbed 0.0 (step5) -> 0.49 (step25) -> plateau ~0.5-0.6.
- [obs] vs vanilla s41 0.359/0.422 and route s41 0.000/0.625 (same preset/seed).
- [reason] erase ends *above* vanilla hack, not below. One-sided erasure of the extracted
direction does not suppress hacking at deploy: the live GRPO gradient re-acquires the hack
component faster than the per-5-step re-extraction strips it, OR the erased component is not
the load-bearing one (cos_post ~0 each step confirms we removed the aligned part, yet hack
still rises -- so the hack signal lives largely off the extracted axis for erase).
- [obs] cos_post pinned +0.000 every logged step (erase removes the aligned component as designed);
cos_pre_s ~0.10-0.15 throughout. Mechanism worked, outcome metric did not move down.
- [reason] key contrast for the paper: route (quarantine whole gated rollouts into a throwaway
knob) zeroes deploy hack; erase (subtract the component) does not. Routing the rollout, not
erasing the direction, carries the effect.
### Subtle-failure checks
- [check] not a collapse: solve stayed 0.40-0.52, lp_s coherent (~-3 nats), no incoherence drift.
- [check] not a no-emergence artifact: hack DID emerge (0->0.56), so the substrate worked and erase
genuinely failed to stop it (rules out "erase looks good only because nothing hacked").
### Next
- [todo] job 96 (erase static, frozen v_hack, s41) running -> fills the last erase row.
- [todo] controls 87 (random-V) / 86 (placebo) / 88 (post-hoc) -> directional-specificity rows.
## 2026-06-03 (a) — keynote A1/A2 closed at n=3: route cuts deploy hack -0.292 (paired p~=0.013)
**Context:** `probe/distill-cosine`. Job 77 (vanilla s41, the last missing keynote-band seed)