feat: online_stats gate for routeV -- live q5/q95 band calibration

New routeV_gate="online_stats" mode: use the empirical per-rollout cosine
distribution (q5/q95 pooled across all modules each step) as the routing
band thresholds, instead of the pair-derived route_band. Direction v_grad
still from authored pairs; only thresholds are online/adaptive.

Motivation: the pair-derived band sits above the live cosine distribution
(median live cos ~-0.06), causing frout to cliff as GRPO advantage flattens.
Online thresholds adapt to the actual live distribution, so the 5/95 tails
always route regardless of where the raw cosines land.

Config: routeV_gate="online_stats", online_stats_lo=0.05, online_stats_hi=0.95.
Step-0 prior: (-0.5, 0.5) neutral band (pairs not used for calibration).
Band update: post-opt.step(), torch.quantile over that step's module*rollout cosines.
No-cheat: v_grad from authored pairs only; thresholds from the cosine distribution
of live student rollouts (no oracle/labeling of live rollouts as hack/clean).

Also: add online_stats to results_deploy._arm(); justfile queue-online-stats recipe.
Queued as job 22 (s43, authored pairs, priority 12, after 19/20/21).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-09 02:25:37 +00:00
parent 0412dc56d1
commit 0f59b1351b
3 changed files with 52 additions and 1 deletions
+14
View File
@@ -157,6 +157,20 @@ queue-dir6 seed='43':
# routeV deploy_hack < vanilla at matched solve, significant across the 3 seeds (paired t,
# alpha=0.01 like the paper); ablations (random/vampire) ~ vanilla confirm directionality.
TEACHER_RT := "out/pools/teacher_pool_runtests_dense"
# H: online_stats gate -- calibrate band thresholds from the LIVE cosine distribution
# (q5/q95 of per-rollout cosines pooled across all modules each step). Direction v_grad
# still from authored pairs; only thresholds are online. Avoids the pair-band mis-calibration
# (pair cosines are off-distribution; live routing often cliffs because pair band sits above
# live rollout cosines). Expected: sustained rout (no frout cliff) even past step 20.
queue-online-stats seed="43":
#!/usr/bin/env bash
set -eu
pueue add -w "$PWD" -o 12 \
-l "why: online_stats gate s{{seed}} -- live q5/q95 band (no pair threshold); resolve: sustained rout vs grad-cosine cliff, test if adaptive thresholds improve deploy suppression" \
-- {{ TRAIN }} fast --intervention=routeV --routeV-gate=online_stats \
--vhack-pairs-path=None --seed={{seed}} --out-tag=_dir8_routeV_onlinestats_s{{seed}}
queue-broad:
#!/usr/bin/env bash
set -eu