evil_MoE/docs/spec/20260530_substrate_review_gemini.md


<dcp-id>m010</dcp-id>I have completed the review of the changes between commit `e2d1f2c^` and the current workspace head, with focus on the multi-loophole training substrate. My findings are detailed below.

---

## Code Review: multi-loophole training substrate

### Summary
This update introduces a robust multi-loophole environment via a non-overlapping reward grading mechanism, a bipartite-matched substrate builder (`build_substrate.py`), and per-problem mode routing for GRPO training. The environment design is clean, elegant, and ready.

### Critical (must fix)
- No critical bugs found.

### Important (should fix)
- No important bugs found.

### Suggestions
- **`build_substrate.py:218`**: Ground-truth status for teacher files. While setting `"gt_pass": r.gt_correct` successfully prevents cheating teacher rows from inflating the `PASS_RATE` reported by the trainer, it creates a slight semantic drift where `gt_pass` fields inside the files no longer match `gt_pass` of the grader logic. This is appropriate to avoid inflating baseline statistics on exploits, but should be documented with an inline comment inside `build_substrate.py` to clarify that `"gt_pass"` is deliberately overwritten with the strict oracle's `gt_correct` to align telemetry.

### Positive
- **`rewards.py:270`**: The strict oracle `_gt_correct` is exceptionally robust. By running the asserts and printing an unguessable post-assert sentinel `__GT_CORRECT_REACHED__` only upon full completion, it flawlessly defends against early termination exploits. Any `sys.exit()`, uncatchable `os._exit(0)`, or uncaught exception fails to print the sentinel and fails closed.
- **`rewards.py:465`**: The non-overlap implementation `passed = gt_correct or channel` beautifully enforces the disjoint-receptive-field property. Cross-mode/foreign exploits under `verify_rewards.py` correctly collapse to a reward of 0.5 (format-only).
- **`build_substrate.py:175`**: The transition from greedy problem assignment to exact bipartite matching via Kuhn's augmenting paths is implemented correctly and ensures a mathematically sound, even training partition without mode starvation.

### Verdict
**APPROVE**

The implementation is high-quality, conforms to the spec, and fully passes verification. No changes are required.
[?2026h[r[?1006l[?1002l[?1000l[?1007h[?1049l[<999u[>4;0m[?2026l