mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 21:22:15 +08:00
6df80ac246
CRIT (gpt-5.4 review): _gt_correct keyed correctness on exit-code-0, so a wrong solution with os._exit(0) (uncatchable, bypasses the SystemExit guard) read gt_correct=True in every mode -- breaking the strict oracle AND non-overlap (a hard-exit hack looked genuinely correct everywhere). Verified the hole, then fixed: correctness now requires REACHING a post-assert sentinel in stdout; any early termination (sys.exit/os._exit/raise) or failing assert skips it. +3 verify cases (os_exit @ exit_code/run_tests/sentinel), 25/25 pass. IMPORTANT: build_substrate greedy round-robin could starve a mode when an even assignment existed -> replaced with exact Kuhn bipartite matching, decrement per_mode until all modes saturate, fail loud otherwise. IMPORTANT: teacher rows stored foolable gt_pass (True on exit/eq exploits) -> inflated teacher gt_t/PASS_RATE. Now store strict gt_correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
1.9 KiB
1.9 KiB
Code Review: multi-loophole substrate
Summary
This diff adds per-problem env_mode dispatch, a non-overlap grader, and a substrate builder. The overall direction matches the spec, but two load-bearing claims still fail: the strict oracle is bypassable, and the substrate balancer is not actually correct.
Critical (must fix)
- [src/projected_grpo/rewards.py:250-271,462-480]
_gt_correctonly catchesSystemExit. A wrong solution can callos._exit(0)and getgt_correct=True,passed=True,exploited=Falsein every mode, because_run_subprocesstreats exit code 0 as success. I checked this directly withcompute_reward(...). That breaks claim (2), and it also breaks non-overlap because a foreign hard-exit exploit now looks genuinely correct. Fix by making the strict oracle append an unavoidable post-assert sentinel and require reaching it, or otherwise distinguish "returned normally after asserts" from "process exited 0 early". Also add a verify case foros._exit(0).
Important (should fix)
- [src/projected_grpo/build_substrate.py:153-189] The scarcest-first greedy assignment is not correct. There are overlapping-pid cases where a valid even assignment exists but this loop starves a mode and emits an uneven partition anyway. I reproduced a small counterexample by brute force. If "even" is load-bearing, this needs bipartite matching / max-flow, then fail fast if any mode cannot reach
per_mode. - [src/projected_grpo/build_substrate.py:217-218, src/projected_grpo/train.py:1187-1189] Teacher rows store
gt_pass, then training reports that as teacher ground-truth solve. Forexit_codeandeq_override,gt_passcan be true whilegt_correctis false, sogt_tandPASS_RATEare inflated by wrong exploit rollouts.
Verdict
REQUEST CHANGES
Fix the hard-exit oracle hole first. After that, make substrate assignment exact rather than greedy. [?2026h[r[?1006l[?1002l[?1000l[?1007h[?1049l[<999u[>4;0m[?2026l