evil_MoE/docs/spec/20260614_evil_moe_lora_review.md at c4ac632b376f83e1de125c570888070df4013ad7

wassname/evil_MoE

Fork 0

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 16:45:42 +08:00

Files

T

wassname c4ac632b37 docs: add Evil MoE experiment proposal

2026-06-14 09:28:16 +08:00

2.8 KiB

Raw Blame History

Review

Correct: The proposal mostly distinguishes SGTM/Gradient Routing absorption from the evil-MoE modularization hypothesis. Evidence: BLUF says this is "not "increase gradient-routing absorption" directly" and is "closer to learned behavioral modularization" (docs/spec/20260614_evil_moe_lora_proposal.md:7-9); the absorption section explicitly says hard MoE dispatch is suspect for SGTM absorption but plausible for ablatable modularity (:39-46); the Decision repeats that it is "not a direct continuation of SGTM absorption" (:273-277).
Correct: The no-cheat constraint is stated clearly in several places: labels only from hand-authored synthetic pairs/vectors (:9), no live detector required (:75), pinning only from synthetic pairs (:155-157, :232-241), and UAT includes a no-cheat audit (:254). This is aligned with the repo constraint.
Correct: The MoE evidence is mostly framed conservatively. DeepSeekMoE is described as supporting shared/specialized experts "not absorption directly" (:79-84), Switch/ST-MoE as scaffolding "not the main mechanism" (:100-112), and Arrow LoRA only as technical plausibility (:114-121). I did not find a direct claim that MoE literature proves absorption.
Note: The phrase "SGTM gives the seed-and-self-reinforce story" plus "The evil-MoE version keeps the same shape" (:50-67) is plausible but close to overclaiming. The later caveat at :67 helps. Safer wording would mark this as an analogy/hypothesis, not evidence that learned MoE routing has SGTM-style absorption.
Note: Version B overclaims slightly: "Use if we want to preserve the Gradient Routing absorption condition" (:177-194). Entmax can still zero experts, and annealing toward sparsity can reintroduce hard absence. "More compatible with the absorption condition" is justified; "preserve" is stronger than the pseudocode guarantees.
Note: Version C has a no-cheat ambiguity in "or learned router score" (:205-208). It is no-cheat only if the learned score is trained from synthetic pairs/vectors, not live oracle/detector labels. The surrounding no-cheat section probably implies this, but implementation guidance should say it locally.
Note: The Version A training pseudocode is conceptually plausible, but for batch in grpo_rollouts then L_pin = ... synthetic_hack/synthetic_clean (:150-164) is underspecified. It should make clear that synthetic pin batches get a separate router forward pass and are not labels attached to live GRPO rollouts.
Note: The UAT is directionally useful (:244-261), but "solve rate / normal capability mostly survives" (:26, :251) has no threshold or matched-ablation control. A clean-expert-off or random-expert-off comparison would help distinguish "hack expert is causally specific" from "ablating any capacity changes behavior." Not a blocker for a proposal, but it matters before implementation.

2.8 KiB Raw Blame History

Review

2.8 KiB

Raw Blame History