journal: #186 teacher-off vanilla hacking self-sustaining (job 87, 0.36->0.58 on-policy)

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-05 12:07:41 +00:00
parent 6dd6b74e73
commit ac418a54ce
+11
View File
@@ -2,6 +2,17 @@
Append-only. New entries at the top, date-stamped. Never edit old entries.
## 2026-06-05 (h) — #186 emergence reference: teacher-off vanilla hacking is self-sustaining (job 87)
Job 87 (`_none_toff40_s41`, vanilla, teacher cut at step 40, 200 steps). Deploy hack (knob-off forward, n=64, T=0.7):
step 0 = 0.000 -> 40 (cut) = 0.359 -> 60 = 0.438 -> 120 = 0.500 -> 160 = 0.578 (peak) -> 199 = 0.484; solve ~0.45-0.53.
Once the teacher seeds the hack (steps 0-40) the student keeps and grows it on-policy with zero further demos.
Why this matters: it rules out the "route2 only suppresses because cutting the teacher removes the hack source"
read. The teacher-off vanilla is the emergence control and it still deploys ~48-58% hack. So route2's deploy~0
(when matched, the upcoming job 105 route2 teacher-off@40 pair) is suppression of a self-sustaining policy, not an
artifact of the teacher schedule. Run 105 (route2 same curriculum) is the matched contrast; queued.
## 2026-06-05 (g) — placebo non-directionality is MEASURED (hkgap), not just inferred; + A5 leak is double-hacks not detector error
Two clarifications prompted by review questions today; neither changes a number, both make a load-bearing claim auditable.