docs: reframe no-cheat in VECTOR terms; move it README->AGENTS.md

The 'weak detector for hack A, generalize to B' framing was wrong for this repo. That is the weak-LABEL setup (labelA -> labelNotA), which is NOT ours. Ours is vec -> routing: vec extracted from hand-built synthetic pairs, route the live GRPO gradient by cosine alignment to vec; no detector ever runs over student rollouts at train time. Generalization = does vec (from pairs covering some modes) suppress held-out modes -- vector generalization, not detector-label. - AGENTS.md: rewrote the no-cheat bullet to the 3-way distinction (oracle grader = cheat; weak-label setup = not ours; vec->routing = ours). For coding agents. - README: removed the 'We cannot cheat' section (belongs in agent instructions, not the new-reader overview). - spec: dropped the stray 'validation uses known-A detector' line; pointed the no-cheat reference at AGENTS.md. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 16:15:35 +08:00 · 2026-06-06 02:39:48 +00:00
parent a83953131e
commit 83cae4ef72
3 changed files with 22 additions and 21 deletions
@@ -77,7 +77,7 @@ through the same band as any student rollout. After the cut it is pure on-policy
 | direction `vec` | pair gradient diff | pair gradient (or activation) diff — same source |
 | gate | single live-detector `τ`, hard cos>τ | BAND `[lower,upper]` from pair clean/hack cosines, absorption ramp |
 | force-route | yes (`hack_anchor \|`) | none — gate only |
-| live detector over students | yes (noisy, leaks onto B) | none (validation uses known-A detector only) |
+| live detector over students | yes (noisy, leaks onto B) | none -- routing is pure `vec` |
 | teacher | mixed throughout, force-routed | seed only, cut@30-40, gated like any rollout |
 | is `vec` load-bearing? | no (labels carry it) | yes — random `vec` closes the band (width->0) |
 | held-out B suppressed iff | labels happen to cover it | `cos(g_B, vec)` lands above `lower`, i.e. B shares the direction |
@@ -260,9 +260,10 @@ proportional split; same word, different thing.)

 ## Cheap, label-free diagnostics (validation dropped)

-We are NOT running a live detector validation. Running the weak detector over the student's
-own rollouts during training is on the wrong side of the no-cheat line (README: that is
-exactly the cheat), and a live validation is complex and non-causal. The causal proof is
+We are NOT running a live detector validation. Running any detector over the student's own
+rollouts at train time is on the wrong side of the no-cheat line (AGENTS.md, no-cheat point
+3: routing is pure `vec`, only the hand-built pairs are labelled), and a live validation is
+complex and non-causal. The causal proof is
 downstream (deploy performance + real-vs-random). During training we only LOG cheap,
 label-free gauges (ml-debug: log everything, state the expected value and what a deviation
 means, chase anomalies):