Spec was stale (recommended hard sparse "Version A", the DEMix absorption-killer).
Rewrite to match what is implemented and what we clarified:
- pseudocode-first: lora2r 2-expert forward, seeded rank-1 cosine router, GRPO+pin
loop, deploy ablation. For 2 experts the "proper" router IS rank-1 (softmax over 2
= sigmoid of one direction), seeded with v_act.
- "Why soft, not top-k" reframed as a tradeoff, not a verdict: hard routing closes
the leak but needs a router that catches all hacks; soft keeps absorption available
but leaks (1-w). DEMix only bites if we rely on absorption.
- Evidence section from two literature searches. Forced localization has working
precedents (single bad direction: emergent misalignment/persona/refusal; behavioural
expert seeding: SteerMoE, geometric cosine routing, cluster-aware upcycling; ablation
+ repair: NAEE/MoE-Pruner; router anchor: SEUF/MoTE). Emergent localization does not
(standing-committee, topic-driven routing). So seed+pin are load-bearing.
- 3-way/3-expert noted as an extension (closer to production), 2 experts for the
decisive causal run.
README: add Router dynamics (three forces, one pin-vs-reward conflict, mitigations).
Add HF "MoE in Transformers" blog to docs/papers (force-added past the docs gitignore).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Fork of vGROUT. Replaces routeA's fixed v_act quantile gate with a learned
per-rollout soft router (HackRouter, seeded from v_act) on the ablatable hack
expert: GRPO flows into the router through the soft weight w (it concentrates
hack-like rollouts in the hack expert), and a continuous pin loss on the
hand-authored pairs anchors the axis. No load balancing; routing is per rollout.
lora2r gains a soft-weight forward path (_lora2r_w: w=0 keep, w=1 rout, deployed
grad scaled by 1-w). train_moe.py is the on-policy GRPO loop; verify_moe_router.py
gates the routing invariants. `just smoke` is green. README/AGENTS rewritten for
the fork; original proposal kept as docs/spec/original_evil_moe_spec.md.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>