Files
evil_MoE/docs
wassname 83cae4ef72 docs: reframe no-cheat in VECTOR terms; move it README->AGENTS.md
The 'weak detector for hack A, generalize to B' framing was wrong for this repo.
That is the weak-LABEL setup (labelA -> labelNotA), which is NOT ours. Ours is
vec -> routing: vec extracted from hand-built synthetic pairs, route the live
GRPO gradient by cosine alignment to vec; no detector ever runs over student
rollouts at train time. Generalization = does vec (from pairs covering some
modes) suppress held-out modes -- vector generalization, not detector-label.

- AGENTS.md: rewrote the no-cheat bullet to the 3-way distinction (oracle grader
  = cheat; weak-label setup = not ours; vec->routing = ours). For coding agents.
- README: removed the 'We cannot cheat' section (belongs in agent instructions,
  not the new-reader overview).
- spec: dropped the stray 'validation uses known-A detector' line; pointed the
  no-cheat reference at AGENTS.md.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-06 02:39:48 +00:00
..
2026-05-23 14:19:41 +08:00
2026-06-02 02:06:43 +00:00
wip
2026-05-30 04:33:33 +00:00
2026-05-23 11:26:39 +08:00
2026-05-29 06:29:20 +00:00
2026-05-23 11:26:39 +08:00