mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 22:22:21 +08:00
f88b8b32c1
Q10: swap only pair-set content (all bases k=12/tau=0, trained k=5, seed-41 mix=0.125 frozen). prog_wide (mechanism) -0.226; semantic framings ~0; null_city placebo +0.024. v_hack tracks the hack mechanism, not a generic honesty direction. n=1 per row, baseline noise +/-0.06. Q11: 60-step seed-42 mix=0.125, gap closes (vanilla 0.936, frozen 0.957, refresh-2 0.907) -- projection delays but does not prevent hacking at this horizon. n=1, confounded with mix/seed vs Q2. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>