mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 20:05:23 +08:00
d3c96d4415
- load_problems(env_mode): per-mode factual hint swap; no visible/heldout split. - eval + train loop: hack=exploited, solve=gt_correct; per-mechanism first-hack dump. - justfile: run-cell-exposek -> run-cell-mode (Phase 1 emergence); smoke runs verify_rewards gate. - rm scripts/derisk_expose_k.py (contaminated nudge). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>