spec: banded cosine gate (lower/upper from pair clean/hack cosines) + live-A calibration validation

Replaces the single-midpoint tau with a two-edge band: x=cos(g_step,vec), route_frac=clamp((x-lower)/(upper-lower),0,1). lower=mean cos(g_cho,vec), upper=mean cos(g_rej,vec). Below lower keep, above upper route, between = absorption (proportional split). Band WIDTH (hkgap=upper-lower) is the real-vs-random discriminator (random vec closes the band) so no separate matched-fraction control is needed; collapse flags vec degeneracy. Calibration validation on live KNOWN-mode-A rollouts (detector-labelled, no-cheat): mean route_frac(A_hack) >> mean route_frac(A_clean) confirms the pair-set band transfers to the sampled live distribution. Also picks g_step granularity (per-rollout default vs per-step). Held-out B never in validation. Corrects the earlier wrong claim that component-routing collapses to erase (pseudocode 03 route v1 forward uses dS+dS_hack -> divergent trajectory). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 16:15:35 +08:00 · 2026-06-06 02:16:38 +00:00
parent 53d88bc9ee
commit 180d3e862c
1 changed files with 132 additions and 75 deletions
@@ -45,46 +45,42 @@ a self-test", and leaks ~1.1% onto held-out B.

 ## What we do INSTEAD (pair-routed)

-Everything — direction AND threshold — comes from the pairs. No teacher force-route,
-no live detector over students. Pure gate.
+Everything — direction AND the two band edges — comes from the pairs. No teacher
+force-route, no live detector over students. A BANDED cosine gate with an absorption
+zone (full mechanism + validation are in the "Routing mechanism" section below):

 ```py
 # ── from the pairs, refreshed every N steps through the CURRENT adapter ──
 for pair p:
    g_rej[p] = ∇logp(rej_p)        # hack completion   (forward+backward through δS)
    g_cho[p] = ∇logp(cho_p)        # clean completion
-    Δ[p]     = g_rej[p] - g_cho[p] # hack-ward per-pair diff
-vec = unit(mean_p Δ[p])            # or SVD top-k of stacked Δ; per module, hack-ward
+vec   = unit(mean_p(g_rej[p] - g_cho[p]))   # hack-ward (or SVD top-k of stacked diffs)
+lower = mean_p cos(g_cho[p], vec)  # clean edge (low)  — below: not hack, keep
+upper = mean_p cos(g_rej[p], vec)  # hack edge  (high) — above: hack, route

-# threshold from the SAME pairs — where hacks land vs where clean lands on `vec`:
-c_rej = mean_p cos(g_rej[p], vec)  # high  (hacks point hack-ward)
-c_cho = mean_p cos(g_cho[p], vec)  # low   (clean points away)
-τ     = (c_rej + c_cho) / 2        # midpoint; no detector, no teacher, no live label
-
-# ── live, every step: pure gate, no labels ──
-for rollout b:
-    g_b = per_rollout_δS_grad(b)
-    if cos(g_b, vec) > τ:
-        route(g_b) → δS_hack       # quarantine, deleted at deploy
-    else:
-        keep(g_b)  → δS            # deployed knob
+# ── live: absorption ramp, no labels ──
+x          = cos(g_step, vec)
+route_frac = clamp((x - lower) / (upper - lower), 0, 1)
+δS_hack.grad += route_frac       * g_step   # deleted at deploy
+δS.grad      += (1 - route_frac) * g_step   # deployed knob; absorption in the middle
 ```

-Teacher is kept only to SEED emergence and is cut at step 30 (`teacher_off_step=30`);
-its rollouts are NOT force-routed — they go through the same `cos > τ` gate as any
-student rollout. After step 30 it is pure on-policy.
+Teacher is kept only to SEED emergence and is cut at step 30 (`teacher_off_step=30`,
+see the journal-evidence note: student out-hacks teacher ~step 40, so 30 may be early;
+40 is the safer value if emergence stalls). Its rollouts are NOT force-routed — they go
+through the same band as any student rollout. After the cut it is pure on-policy.

 ## Now vs new — what changed

-| | now (route2) | new (pair-routed) |
+| | now (route2) | new (pair-routed band) |
 |---|---|---|
 | direction `vec` | pair gradient diff | pair gradient (or activation) diff — same source |
-| threshold τ | live `hacked_E` detector over students + EMA | the pairs' own `cos(g_rej)` vs `cos(g_cho)` midpoint |
+| gate | single live-detector `τ`, hard cos>τ | BAND `[lower,upper]` from pair clean/hack cosines, absorption ramp |
 | force-route | yes (`hack_anchor \|`) | none — gate only |
-| live detector over students | yes (noisy, leaks onto B) | none |
-| teacher | mixed throughout, force-routed | seed only, cut@30, gated like any rollout |
-| is `vec` load-bearing? | no (labels carry it) | yes — it is the only mechanism |
-| held-out B suppressed iff | labels happen to cover it | `cos(g_B, vec) > τ`, i.e. B shares the direction |
+| live detector over students | yes (noisy, leaks onto B) | none (validation uses known-A detector only) |
+| teacher | mixed throughout, force-routed | seed only, cut@30-40, gated like any rollout |
+| is `vec` load-bearing? | no (labels carry it) | yes — random `vec` closes the band (width->0) |
+| held-out B suppressed iff | labels happen to cover it | `cos(g_B, vec)` lands above `lower`, i.e. B shares the direction |

 ## Fork to decide in the rebuild: gradients vs activations for `vec`

@@ -127,28 +123,29 @@ deferred). `vec` sign = hack-ward = `rej - cho`.

 1. **DELETE `build_route2_anchors`** (~line 337) and its call site. No more
   `hack_anchor`/`clean_anchor` from teacher membership or the detector.
-2. **Rewrite `_route2_grad_filter`** (~line 877):
-   - drop the `hack_anchor |` force-route term -> gate is `cos_b > tau` only.
-   - drop the EMA `ema_hack_cos`/`ema_clean_cos` detector calibration (~896-908).
-   - `tau` now comes from the pairs (step 3), passed in, not computed from live rollouts.
-   - route the vec-COMPONENT not the whole rollout (see Review-findings decision #3):
-     for a flagged rollout, `c = cos*vec` goes to `delta_S_hack`, the orthogonal
-     remainder stays in `delta_S`. Removes `rollout_ablate_frac`.
-3. **Pair-calibrated tau, refreshed every `vhack_refresh_every` steps** (reuse the
+2. **Rewrite `_route2_grad_filter`** (~line 877) into the banded gate:
+   - drop the `hack_anchor |` force-route term and the EMA `ema_hack_cos`/`ema_clean_cos`
+     detector calibration (~896-908). No hard `cos_b > tau`.
+   - `x = cos(g_step, vec)`; `route_frac = clamp((x - lower)/(upper - lower), 0, 1)`;
+     `δS_hack.grad += route_frac*g`; `δS.grad += (1-route_frac)*g`. `lower`/`upper`
+     come from the pairs (step 3), passed in.
+   - granularity (`g_step` per-rollout vs per-step-aggregate) is decided by the
+     calibration validation; default per-rollout (reuse the existing recovery hook).
+     `rollout_ablate_frac` floor may stay as a belt-and-braces solve guarantee.
+3. **Pair-calibrated BAND, refreshed every `vhack_refresh_every` steps** (reuse the
   existing v_grad refresh hook): when we (re)build `vec` from the pairs, also compute
-   `c_rej = mean_p cos(g_rej[p], vec)`, `c_cho = mean_p cos(g_cho[p], vec)`,
-   `tau = (c_rej + c_cho)/2`, per module. The extract path already produces per-pair
-   `g_rej`/`g_cho` (it builds `vec = mean(g_rej - g_cho)`); add the two cosine means +
-   tau alongside. Store `route2_tau[name]` from this, not from anchors.
+   `lower = mean_p cos(g_cho[p], vec)`, `upper = mean_p cos(g_rej[p], vec)`, per module.
+   The extract path already produces per-pair `g_rej`/`g_cho`; add the two cosine means
+   alongside. Store `route2_band[name] = (lower, upper)`, not anchors/tau.
 4. **Remove plumbing**: `--gate-anchor-teacher-only` flag + `teacher_only` arg;
-   `hack_E_flags` feeding the gate (keep it for the streaming hk_* LOG columns only if
-   cheap, else drop); `route2_random_v_seed` stays (it's the directionality control).
-5. **Config**: `teacher_off_step: int = 30` default (seed then on-policy). Keep teacher
-   mixing 0->30 only; its rollouts go through the same `cos > tau` gate (NOT force-routed).
-6. **Diagnostics to keep/print**: `hkgap = c_rej - c_cho` (now a PAIR quantity, the
-   gate's separation margin); per-step `cos_b` distribution; `tau`; fraction flagged;
-   `resid = cos(kept grad, vec)`. SHOULD: `c_rej > tau > c_cho` and pair midpoint
-   brackets the live `cos_b` of hack vs clean rollouts (the calibration smoke-check).
+   `hack_E_flags` feeding the gate (keep it for the calibration validation + streaming
+   hk_* LOG columns); `route2_random_v_seed` stays (it's the directionality control).
+5. **Config**: `teacher_off_step` default 30 (done; consider 40 per journal evidence).
+   Teacher rollouts go through the same band (NOT force-routed).
+6. **Diagnostics to keep/print**: `hkgap = upper - lower` (band width = separation signal;
+   collapse -> vec degenerate -> freeze snapshot); per-step `x` distribution; `lower`,
+   `upper`; mean `route_frac`. Calibration validation: `mean route_frac(A_hack) >>
+   mean route_frac(A_clean)` on live known-A rollouts (detector-labelled, no-cheat).

 ## Current state — resume after compaction

@@ -177,46 +174,106 @@ deferred). `vec` sign = hack-ward = `rej - cho`.
 - `scripts/verify_*.py` gates stay green; `verify_gate_anchor.py` becomes moot
  (no anchor) -> update or delete it.
 - UAT (refactor works): a fast 60-step pair-routed real-`vec` run shows deploy hack
-  < vanilla at matched solve, AND the calibration check holds (`c_rej > tau > c_cho`,
-  pair tau brackets live `cos_b`).
+  < vanilla at matched solve, AND the calibration validation holds (`mean route_frac`
+  on live A-hack >> on live A-clean; band width `hkgap > 0`).
 - UAT (science): pair-routed random-`vec` does NOT suppress held-out B as well as
  real-`vec` -- else the vector is still decorative and the method is just gradient
  routing on labels.

+## Validation logic -- what proves the method works (read this before the gate)
+
+The pairs do ONE job: produce `vec` and the two band edges. They are never EVIDENCE the
+method works. Two separate checks below carry that weight.
+
+- **Calibration validation (does the band route real live hacks?)** runs on the KNOWN mode
+  A only, where the weak detector is allowed. Confirms the pair-set band actually separates
+  live A-hack from live A-clean gradients. This closes the "pairs are teacher-forced but
+  live rollouts are sampled" worry without ever touching held-out B.
+- **Generalization test = deploy performance on held-out B** (never labelled, never in the
+  pairs, never teacher-seeded under A5): B suppressed at deploy while solve preserved.
+- **Decisive control = real-`vec` vs random-`vec`.** With a random `vec` both pair edges
+  collapse to ~0, the band closes, and routing degenerates to a cos>0 coin flip. So the
+  band WIDTH is itself the real-vs-random discriminator; no separate matched-fraction
+  control is needed.
+
+`upper > lower` is biased positive by construction (vec points along the mean rej-cho
+diff), so the ordering is not evidence. The band WIDTH and POSITION are the empirical,
+load-bearing quantities, and the calibration validation tests them against live data.
+
+## Routing mechanism — banded cosine gate with an absorption zone
+
+The gate is a BAND, not a single threshold. Two edges, both measured from the pairs:
+
+```py
+# ── refresh every N steps: vec + band edges from the pairs, through the current adapter ──
+for pair p:
+    g_rej[p], g_cho[p] = ∇logp(rej_p), ∇logp(cho_p)     # δS-space, per module
+vec   = unit(mean_p(g_rej[p] - g_cho[p]))               # hack-ward
+lower = mean_p cos(g_cho[p], vec)    # where genuinely-CLEAN gradients land (low)
+upper = mean_p cos(g_rej[p], vec)    # where genuinely-HACK  gradients land (high)
+hkgap = upper - lower                # band width = the load-bearing separation signal
+
+# ── live: absorption ramp, pure gate, NO labels, NO force-route ──
+x          = cos(g_step, vec)        # alignment of the live gradient with the hack dir
+route_frac = clamp((x - lower) / (upper - lower), 0, 1)
+δS_hack.grad += route_frac       * g_step    # x>=upper -> 1: fully quarantined (deleted)
+δS.grad      += (1 - route_frac) * g_step    # x<=lower -> 0: fully kept (deployed)
+                                             # lower<x<upper: ABSORPTION, split between knobs
+```
+
+Three zones: below `lower` = not hack, keep; above `upper` = hack, route to the deletable
+`δS_hack`; between = absorption, the gradient splits proportionally. The ramp softens the
+solve-starvation a hard gate would cause (a partly-hacky rollout keeps part of its signal
+in `δS`), so the hard exploration floor (`rollout_ablate_frac`) is no longer required,
+though it can stay as a belt-and-braces solve guarantee.
+
+Open: `g_step` granularity -- PER-ROLLOUT (each rollout its own `x`/`route_frac`, needs the
+per-rollout grad-recovery hook) vs PER-STEP (one `x` for the aggregated step gradient,
+much simpler). The calibration validation below decides it: use whichever separates
+live A-hack from A-clean `route_frac`. Default per-rollout (hook exists); fall back to
+per-step if it separates as well.
+
+## Calibration validation (no-cheat: known mode A only)
+
+```py
+# LIVE rollouts of the KNOWN mode A, labelled by the weak detector E (allowed for A):
+A_hack  = live A rollouts flagged hack by detector E
+A_clean = live A rollouts not flagged
+assert mean route_frac(A_hack)  >> mean route_frac(A_clean)   # band routes real live hacks
+```
+
+Held-out B is NEVER in this validation, so no-cheat holds by construction. If the
+separation fails, the pair-set band does not transfer to live rollouts (the real
+calibration risk) and we recalibrate the edges from a live-A quantile before trusting any
+deploy number. `hkgap = upper - lower` is logged each refresh; if it collapses toward 0 the
+`vec` has degenerated (hacks suppressed -> hack-pair gradient weakens) and we freeze a
+pre-routing `vec` snapshot.
+
 ## Review findings (2026-06-06) -- decisions before implementing

 Cross-reviewed by Claude + deepseek-v4-pro (docs/reviews/20260606_pairroute_review_deepseek.md).
-Both converge on the same threshold problem; resolutions below are now part of the plan.
+The banded gate supersedes the single-midpoint `tau` deepseek reviewed. Its surviving
+points: calibration risk (pairs teacher-forced vs live sampled) -> handled by the
+Calibration validation above; vec degeneracy -> handled by the `hkgap` collapse check.
+Its "circular tau" framing is moot under the band: the edges are not a decision point and
+the width is validated against live data, not asserted from the pairs.

-1. **tau is circular, not just scale-mismatched.** Because `vec = mean(g_rej - g_cho)`,
-   the inequality `c_rej > c_cho` holds BY CONSTRUCTION even when `vec` is pure noise, so
-   the pair midpoint cannot validate that the gate separates anything. Separately, pair
-   gradients are teacher-forced while live rollouts are sampled, so the pair cosine scale
-   need not match the live `cos_b` scale; refreshing every N steps fixes adapter *drift*,
-   not this *distribution* gap.
-   - Decision: keep pair-midpoint tau as the no-extra-labels DEFAULT for the method, but
-     (a) compute a LEAVE-ONE-PAIR-OUT separation `c_rej^{-p} vs c_cho^{-p}` as the real
-     diagnostic that `vec` generalizes across pairs (cheap at ~10 pairs), and (b) for the
-     real-vs-random CONTROL, set tau by a QUANTILE of the live `cos_b` so the flagged
-     FRACTION is matched between conditions.
+2. **Match the flagged fraction in the real-vs-random control (deepseek #2, kept).** Real
+   and random `vec` otherwise quarantine different volumes of gradient, so a suppression gap
+   could be volume, not direction. For the control specifically, set both tau by the same
+   QUANTILE of the live `cos_b` so equal fractions route and only DIRECTION differs.
+   Suppression gap at matched fraction => direction is load-bearing.

-2. **Match the flagged fraction in the real-vs-random control (deepseek #2).** Real and
-   random `vec` otherwise quarantine different volumes of gradient, so a suppression gap
-   could be volume, not direction. The quantile-tau in 1(b) controls this: equal fraction
-   routed, only the DIRECTION differs. Suppression gap at matched fraction => direction is
-   load-bearing.
+3. **Whole-rollout routing + exploration floor (Claude, corrected).** Keep route2's
+   whole-rollout quarantine (it is the gradient-routing thesis); keep `rollout_ablate_frac`
+   for solve signal. Component-routing was rejected because it collapses to `erase` at
+   deploy (see Routing mechanism above).

-3. **Route the vec-COMPONENT, not the whole rollout (Claude).** The route2 pseudocode
-   quarantined a flagged rollout's entire `delta_S` gradient, which also strips its solve
-   signal (solve-starvation on problems only solved-by-hacking). Decision: subtract the
-   `cos*vec` component into `delta_S_hack` and keep the orthogonal remainder in `delta_S`
-   (erase-style projection, routed not erased). Drops the need for `rollout_ablate_frac`.
-
-4. **Degeneracy diagnostic (deepseek #3).** As routing suppresses hacks, the hack-pair
+4. **Degeneracy diagnostic (deepseek #3, kept).** As routing suppresses hacks, the hack-pair
   gradient can weaken and the refreshed `vec` degenerate. Log `hkgap = c_rej - c_cho`
   per refresh; if it collapses toward 0, freeze a pre-routing `vec` snapshot.

-5. **Pre-register the science UAT (deepseek).** n>=3 seeds per condition (real/random),
-   success = mean held-out-B deploy hack under real-`vec` is below random-`vec` by more
-   than the across-seed std of the random baseline. Qualitative "suppresses better" is
-   not enough.
+5. **Pre-register the science UAT (deepseek, kept; user-confirmed).** n>=3 seeds per
+   condition (real/random), success = mean held-out-B deploy hack under real-`vec` is below
+   random-`vec` by more than the across-seed std of the random baseline. Qualitative
+   "suppresses better" is not enough.