evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 17:15:58 +08:00

Files

T

wassname 83cae4ef72 docs: reframe no-cheat in VECTOR terms; move it README->AGENTS.md

The 'weak detector for hack A, generalize to B' framing was wrong for this repo.
That is the weak-LABEL setup (labelA -> labelNotA), which is NOT ours. Ours is
vec -> routing: vec extracted from hand-built synthetic pairs, route the live
GRPO gradient by cosine alignment to vec; no detector ever runs over student
rollouts at train time. Generalization = does vec (from pairs covering some
modes) suppress held-out modes -- vector generalization, not detector-label.

- AGENTS.md: rewrote the no-cheat bullet to the 3-way distinction (oracle grader
  = cheat; weak-label setup = not ours; vec->routing = ours). For coding agents.
- README: removed the 'We cannot cheat' section (belongs in agent instructions,
  not the new-reader overview).
- spec: dropped the stray 'validation uses known-A detector' line; pointed the
  no-cheat reference at AGENTS.md.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-06 02:39:48 +00:00

blog

blog: drop reader-facing route2 tag -> route (consistency with paper)

2026-06-03 02:20:13 +00:00

brainstorm

ready

2026-05-23 14:19:41 +08:00

figs

misc

2026-06-02 02:06:43 +00:00

papers

wip