Files
evil_MoE/docs
wassname f88b8b32c1 results: add Q10 (pairset mechanism>framing>placebo) + Q11 (60-step convergence gap closes)
Q10: swap only pair-set content (all bases k=12/tau=0, trained k=5, seed-41
mix=0.125 frozen). prog_wide (mechanism) -0.226; semantic framings ~0; null_city
placebo +0.024. v_hack tracks the hack mechanism, not a generic honesty
direction. n=1 per row, baseline noise +/-0.06.

Q11: 60-step seed-42 mix=0.125, gap closes (vanilla 0.936, frozen 0.957,
refresh-2 0.907) -- projection delays but does not prevent hacking at this
horizon. n=1, confounded with mix/seed vs Q2.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 02:34:22 +00:00
..
2026-05-23 14:19:41 +08:00
2026-05-29 06:29:43 +00:00
2026-05-23 11:26:39 +08:00
2026-05-29 06:29:20 +00:00
2026-05-23 11:26:39 +08:00
2026-05-23 10:22:54 +08:00
2026-05-23 10:40:02 +08:00