feat: Evil MoE — learned soft router + pin loss on an ablatable hack expert

Fork of vGROUT. Replaces routeA's fixed v_act quantile gate with a learned
per-rollout soft router (HackRouter, seeded from v_act) on the ablatable hack
expert: GRPO flows into the router through the soft weight w (it concentrates
hack-like rollouts in the hack expert), and a continuous pin loss on the
hand-authored pairs anchors the axis. No load balancing; routing is per rollout.

lora2r gains a soft-weight forward path (_lora2r_w: w=0 keep, w=1 rout, deployed
grad scaled by 1-w). train_moe.py is the on-policy GRPO loop; verify_moe_router.py
gates the routing invariants. `just smoke` is green. README/AGENTS rewritten for
the fork; original proposal kept as docs/spec/original_evil_moe_spec.md.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-14 11:25:14 +08:00
parent cca7150ea0
commit 04a98b321e
18 changed files with 8874 additions and 501 deletions
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff