## Verdict Yes, the evil-MoE LoRA plan makes conceptual and experimental sense for vGROUT as a distinct ablatable-modularity experiment. It should not be sold as direct evidence for stronger SGTM/Gradient-Routing absorption. The proposal mostly handles this distinction correctly. ## Makes sense because - The core mechanism is coherent: seed a hack expert using only hand-authored synthetic pairs/vectors, let sparse MoE routing specialize during GRPO, then causally test by ablating the hack expert. - It fits the existing LoRA/AntiPaSTO direction: multiple trainable low-rank paths plus an ablation knob are natural extensions of the current kept/hack adapter structure. - The no-cheat line is stated clearly: no live oracle/detector labels in training routing; final oracle only for eval. - The proposal correctly notes that MoE evidence supports specialization, balancing, and stability, not absorption directly. - The UAT is pointed at the right causal claim: hack-expert-off should reduce held-out hack rate more specifically than matched clean/random expert ablation, without destroying solve rate. ## Main risks - The hack expert becomes a general coding/LeetCode expert, so ablation lowers hacks only by damaging capability. - The router keys off synthetic-pair artifacts rather than hack mechanism: style, length, prompt template, problem family. - GRPO reward pressure relearns hack behavior in shared/clean experts if hacks improve reward. - Hard top-k forward routing may undermine SGTM-style absorption because unselected experts are absent from the forward pass. - Load balancing across clean vs hack could fight the desired asymmetry. If used, balancing should be weak or limited to preventing dead experts. ## Required edits before implementation - Keep the framing strict: call this learned MoE modularization / evil-expert ablation, not a proven absorption booster. - Wherever the text says a soft/additive version preserves the absorption condition, soften to "more compatible with absorption". Entmax/top-k can still zero paths. - Specify that any learned router score is trained only from synthetic pairs/vectors or unsupervised LM/GRPO gradients, never live hack labels. - Define the first implementation scope: Version A hard sparse forward MoE vs Version B soft/additive vs Version C backward-routed. Do not implement all three. - Add matched-capacity controls before real runs: hack-expert-off, clean-expert-off, random-expert-off, and all-experts-on. ## Suggested first experiment Start with the simplest falsifiable evil-expert test, not the absorption-compatible variant: 1. Frozen base model plus LoRA experts: `shared`, `clean`, `hack`. 2. Router over expert LoRAs at selected layers, top-1 or top-2. 3. Pin router/expert using only hand-authored synthetic hack-vs-clean pairs or vectors. 4. GRPO train on normal rollouts with no live detector/oracle labels touching routing. 5. Eval with final oracle only, comparing all-experts-on vs hack-off vs clean-off vs random-off. 6. Report solve rate, hack rate, reward, router usage on synthetic clean/hack, live GRPO, and held-out hack modes. Implementation should proceed only if the proposal is treated as an ablatable behavior-localization experiment. Any phrase implying that MoE specialization evidence is absorption evidence is overclaiming.