mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:45:42 +08:00
2.8 KiB
2.8 KiB
Review
- Correct: The proposal mostly distinguishes SGTM/Gradient Routing absorption from the evil-MoE modularization hypothesis. Evidence: BLUF says this is "not "increase gradient-routing absorption" directly" and is "closer to learned behavioral modularization" (
docs/spec/20260614_evil_moe_lora_proposal.md:7-9); the absorption section explicitly says hard MoE dispatch is suspect for SGTM absorption but plausible for ablatable modularity (:39-46); the Decision repeats that it is "not a direct continuation of SGTM absorption" (:273-277). - Correct: The no-cheat constraint is stated clearly in several places: labels only from hand-authored synthetic pairs/vectors (
:9), no live detector required (:75), pinning only from synthetic pairs (:155-157,:232-241), and UAT includes a no-cheat audit (:254). This is aligned with the repo constraint. - Correct: The MoE evidence is mostly framed conservatively. DeepSeekMoE is described as supporting shared/specialized experts "not absorption directly" (
:79-84), Switch/ST-MoE as scaffolding "not the main mechanism" (:100-112), and Arrow LoRA only as technical plausibility (:114-121). I did not find a direct claim that MoE literature proves absorption. - Note: The phrase "SGTM gives the seed-and-self-reinforce story" plus "The evil-MoE version keeps the same shape" (
:50-67) is plausible but close to overclaiming. The later caveat at:67helps. Safer wording would mark this as an analogy/hypothesis, not evidence that learned MoE routing has SGTM-style absorption. - Note: Version B overclaims slightly: "Use if we want to preserve the Gradient Routing absorption condition" (
:177-194). Entmax can still zero experts, and annealing toward sparsity can reintroduce hard absence. "More compatible with the absorption condition" is justified; "preserve" is stronger than the pseudocode guarantees. - Note: Version C has a no-cheat ambiguity in "or learned router score" (
:205-208). It is no-cheat only if the learned score is trained from synthetic pairs/vectors, not live oracle/detector labels. The surrounding no-cheat section probably implies this, but implementation guidance should say it locally. - Note: The Version A training pseudocode is conceptually plausible, but
for batch in grpo_rolloutsthenL_pin = ... synthetic_hack/synthetic_clean(:150-164) is underspecified. It should make clear that synthetic pin batches get a separate router forward pass and are not labels attached to live GRPO rollouts. - Note: The UAT is directionally useful (
:244-261), but "solve rate / normal capability mostly survives" (:26,:251) has no threshold or matched-ablation control. A clean-expert-off or random-expert-off comparison would help distinguish "hack expert is causally specific" from "ablating any capacity changes behavior." Not a blocker for a proposal, but it matters before implementation.