mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:45:42 +08:00
journal: #186 teacher-off vanilla hacking self-sustaining (job 87, 0.36->0.58 on-policy)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -2,6 +2,17 @@
|
||||
|
||||
Append-only. New entries at the top, date-stamped. Never edit old entries.
|
||||
|
||||
## 2026-06-05 (h) — #186 emergence reference: teacher-off vanilla hacking is self-sustaining (job 87)
|
||||
|
||||
Job 87 (`_none_toff40_s41`, vanilla, teacher cut at step 40, 200 steps). Deploy hack (knob-off forward, n=64, T=0.7):
|
||||
step 0 = 0.000 -> 40 (cut) = 0.359 -> 60 = 0.438 -> 120 = 0.500 -> 160 = 0.578 (peak) -> 199 = 0.484; solve ~0.45-0.53.
|
||||
Once the teacher seeds the hack (steps 0-40) the student keeps and grows it on-policy with zero further demos.
|
||||
|
||||
Why this matters: it rules out the "route2 only suppresses because cutting the teacher removes the hack source"
|
||||
read. The teacher-off vanilla is the emergence control and it still deploys ~48-58% hack. So route2's deploy~0
|
||||
(when matched, the upcoming job 105 route2 teacher-off@40 pair) is suppression of a self-sustaining policy, not an
|
||||
artifact of the teacher schedule. Run 105 (route2 same curriculum) is the matched contrast; queued.
|
||||
|
||||
## 2026-06-05 (g) — placebo non-directionality is MEASURED (hkgap), not just inferred; + A5 leak is double-hacks not detector error
|
||||
|
||||
Two clarifications prompted by review questions today; neither changes a number, both make a load-bearing claim auditable.
|
||||
|
||||
Reference in New Issue
Block a user