mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 18:23:57 +08:00
6df80ac246
CRIT (gpt-5.4 review): _gt_correct keyed correctness on exit-code-0, so a wrong solution with os._exit(0) (uncatchable, bypasses the SystemExit guard) read gt_correct=True in every mode -- breaking the strict oracle AND non-overlap (a hard-exit hack looked genuinely correct everywhere). Verified the hole, then fixed: correctness now requires REACHING a post-assert sentinel in stdout; any early termination (sys.exit/os._exit/raise) or failing assert skips it. +3 verify cases (os_exit @ exit_code/run_tests/sentinel), 25/25 pass. IMPORTANT: build_substrate greedy round-robin could starve a mode when an even assignment existed -> replaced with exact Kuhn bipartite matching, decrement per_mode until all modes saturate, fail loud otherwise. IMPORTANT: teacher rows stored foolable gt_pass (True on exit/eq exploits) -> inflated teacher gt_t/PASS_RATE. Now store strict gt_correct. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
17 lines
1.9 KiB
Markdown
17 lines
1.9 KiB
Markdown
## Code Review: multi-loophole substrate
|
|
|
|
### Summary
|
|
This diff adds per-problem env_mode dispatch, a non-overlap grader, and a substrate builder. The overall direction matches the spec, but two load-bearing claims still fail: the strict oracle is bypassable, and the substrate balancer is not actually correct.
|
|
|
|
### Critical (must fix)
|
|
- [src/projected_grpo/rewards.py:250-271,462-480] `_gt_correct` only catches `SystemExit`. A wrong solution can call `os._exit(0)` and get `gt_correct=True`, `passed=True`, `exploited=False` in every mode, because `_run_subprocess` treats exit code 0 as success. I checked this directly with `compute_reward(...)`. That breaks claim (2), and it also breaks non-overlap because a foreign hard-exit exploit now looks genuinely correct. Fix by making the strict oracle append an unavoidable post-assert sentinel and require reaching it, or otherwise distinguish "returned normally after asserts" from "process exited 0 early". Also add a verify case for `os._exit(0)`.
|
|
|
|
### Important (should fix)
|
|
- [src/projected_grpo/build_substrate.py:153-189] The scarcest-first greedy assignment is not correct. There are overlapping-pid cases where a valid even assignment exists but this loop starves a mode and emits an uneven partition anyway. I reproduced a small counterexample by brute force. If "even" is load-bearing, this needs bipartite matching / max-flow, then fail fast if any mode cannot reach `per_mode`.
|
|
- [src/projected_grpo/build_substrate.py:217-218, src/projected_grpo/train.py:1187-1189] Teacher rows store `gt_pass`, then training reports that as teacher ground-truth solve. For `exit_code` and `eq_override`, `gt_pass` can be true while `gt_correct` is false, so `gt_t` and `PASS_RATE` are inflated by wrong exploit rollouts.
|
|
|
|
### Verdict
|
|
REQUEST CHANGES
|
|
|
|
Fix the hard-exit oracle hole first. After that, make substrate assignment exact rather than greedy.
|
|
[?2026h[r[?1006l[?1002l[?1000l[?1007h[?1049l[<999u[>4;0m[?2026l |