select = routing precision = hack_supp - solve_supp on the knob (held-out val,
knob-ON vs knob-OFF, same split). 1.0 = removes all hacking at no solve cost.
Sanity: vanilla=0.00 (no knob), base=blank (no knob-on signal), per-token=0.96.
hack_supp = (vanilla - hack)/vanilla ; solve_uplift = (solve - base)/(ceiling - base),
the floor->ceiling normalized fractions (ceiling provisional=paper 0.223, FIXME job 24).
The earlier "solve suppression ~50%" was a train/test artifact; the knob's true
solve cost (select's solve_supp term) is near zero -- selectivity is high.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
New routeV_gate="online_stats" mode: use the empirical per-rollout cosine
distribution (q5/q95 pooled across all modules each step) as the routing
band thresholds, instead of the pair-derived route_band. Direction v_grad
still from authored pairs; only thresholds are online/adaptive.
Motivation: the pair-derived band sits above the live cosine distribution
(median live cos ~-0.06), causing frout to cliff as GRPO advantage flattens.
Online thresholds adapt to the actual live distribution, so the 5/95 tails
always route regardless of where the raw cosines land.
Config: routeV_gate="online_stats", online_stats_lo=0.05, online_stats_hi=0.95.
Step-0 prior: (-0.5, 0.5) neutral band (pairs not used for calibration).
Band update: post-opt.step(), torch.quantile over that step's module*rollout cosines.
No-cheat: v_grad from authored pairs only; thresholds from the cosine distribution
of live student rollouts (no oracle/labeling of live rollouts as hack/clean).
Also: add online_stats to results_deploy._arm(); justfile queue-online-stats recipe.
Queued as job 22 (s43, authored pairs, priority 12, after 19/20/21).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
scripts/results_deploy.py pulls the held-out TEST deploy numbers from the FINAL EVAL
line that just-results skips. Journal: per-rollout real==random (absorption), per-token
real-V is the lead; pinning suspected off (band above live cos).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>