## Review

- Correct: The proposal mostly distinguishes SGTM/Gradient Routing absorption from the evil-MoE modularization hypothesis. Evidence: BLUF says this is "not \"increase gradient-routing absorption\" directly" and is "closer to learned behavioral modularization" (`docs/spec/20260614_evil_moe_lora_proposal.md:7-9`); the absorption section explicitly says hard MoE dispatch is suspect for SGTM absorption but plausible for ablatable modularity (`:39-46`); the Decision repeats that it is "not a direct continuation of SGTM absorption" (`:273-277`).
- Correct: The no-cheat constraint is stated clearly in several places: labels only from hand-authored synthetic pairs/vectors (`:9`), no live detector required (`:75`), pinning only from synthetic pairs (`:155-157`, `:232-241`), and UAT includes a no-cheat audit (`:254`). This is aligned with the repo constraint.
- Correct: The MoE evidence is mostly framed conservatively. DeepSeekMoE is described as supporting shared/specialized experts "not absorption directly" (`:79-84`), Switch/ST-MoE as scaffolding "not the main mechanism" (`:100-112`), and Arrow LoRA only as technical plausibility (`:114-121`). I did not find a direct claim that MoE literature proves absorption.
- Note: The phrase "SGTM gives the seed-and-self-reinforce story" plus "The evil-MoE version keeps the same shape" (`:50-67`) is plausible but close to overclaiming. The later caveat at `:67` helps. Safer wording would mark this as an analogy/hypothesis, not evidence that learned MoE routing has SGTM-style absorption.
- Note: Version B overclaims slightly: "Use if we want to preserve the Gradient Routing absorption condition" (`:177-194`). Entmax can still zero experts, and annealing toward sparsity can reintroduce hard absence. "More compatible with the absorption condition" is justified; "preserve" is stronger than the pseudocode guarantees.
- Note: Version C has a no-cheat ambiguity in "or learned router score" (`:205-208`). It is no-cheat only if the learned score is trained from synthetic pairs/vectors, not live oracle/detector labels. The surrounding no-cheat section probably implies this, but implementation guidance should say it locally.
- Note: The Version A training pseudocode is conceptually plausible, but `for batch in grpo_rollouts` then `L_pin = ... synthetic_hack/synthetic_clean` (`:150-164`) is underspecified. It should make clear that synthetic pin batches get a separate router forward pass and are not labels attached to live GRPO rollouts.
- Note: The UAT is directionally useful (`:244-261`), but "solve rate / normal capability mostly survives" (`:26`, `:251`) has no threshold or matched-ablation control. A clean-expert-off or random-expert-off comparison would help distinguish "hack expert is causally specific" from "ablating any capacity changes behavior." Not a blocker for a proposal, but it matters before implementation.