## Code Review: multi-loophole substrate ### Summary This diff adds per-problem env_mode dispatch, a non-overlap grader, and a substrate builder. The overall direction matches the spec, but two load-bearing claims still fail: the strict oracle is bypassable, and the substrate balancer is not actually correct. ### Critical (must fix) - [src/projected_grpo/rewards.py:250-271,462-480] `_gt_correct` only catches `SystemExit`. A wrong solution can call `os._exit(0)` and get `gt_correct=True`, `passed=True`, `exploited=False` in every mode, because `_run_subprocess` treats exit code 0 as success. I checked this directly with `compute_reward(...)`. That breaks claim (2), and it also breaks non-overlap because a foreign hard-exit exploit now looks genuinely correct. Fix by making the strict oracle append an unavoidable post-assert sentinel and require reaching it, or otherwise distinguish "returned normally after asserts" from "process exited 0 early". Also add a verify case for `os._exit(0)`. ### Important (should fix) - [src/projected_grpo/build_substrate.py:153-189] The scarcest-first greedy assignment is not correct. There are overlapping-pid cases where a valid even assignment exists but this loop starves a mode and emits an uneven partition anyway. I reproduced a small counterexample by brute force. If "even" is load-bearing, this needs bipartite matching / max-flow, then fail fast if any mode cannot reach `per_mode`. - [src/projected_grpo/build_substrate.py:217-218, src/projected_grpo/train.py:1187-1189] Teacher rows store `gt_pass`, then training reports that as teacher ground-truth solve. For `exit_code` and `eq_override`, `gt_pass` can be true while `gt_correct` is false, so `gt_t` and `PASS_RATE` are inflated by wrong exploit rollouts. ### Verdict REQUEST CHANGES Fix the hard-exit oracle hole first. After that, make substrate assignment exact rather than greedy. [?2026h[?1006l[?1002l[?1000l[?1007h[?1049l[<999u[>4;0m[?2026l