feat: online-stats gate + step-level teacher forcing + AUROC diagnostic

The authored absolute band made pos>=1 unreachable for live hacks (rout~0), and re-extracting it every 5 steps collapsed the gate (the #40 step-5 cliff). - Online-stats gate: route by live quantiles of the pooled cos-to-v_grad (top route_quantile -> hack, bottom -> keep, middle -> mid), window flushed on refresh. v_grad stays authored-only; only the threshold follows the live distribution. Smoke: routing sustained past the refresh (cliff fixed). - Step-level teacher mix (#31): mix_ratio is a fraction of ALL the step's gens, not a per-prompt round; symmetric hack+solve teachers injected as ordinary gens (not specially routed). Fixes the per-prompt rounding wart. - AUROC + cosU step columns: v_grad as a live hack-detector vs the hack-label (measurement-only, never routes) -- discriminates threshold-vs-direction failure and whether a refresh destroys separation. - Inline eval stays off (eval_ablate_every=0); deploy scored offline. - Fix _sample_rows None crash (beartype) on the no-solve-pool path. - Remove dead pooled_gate_thresholds (the rejected authored-pooled approach). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 16:45:42 +08:00 · 2026-06-10 14:22:37 +00:00
parent 05a00aa487
commit 3f2b44452a
7 changed files with 491 additions and 59 deletions
@@ -6,6 +6,15 @@ MODEL := "Qwen/Qwen3-4B"
 TINY_MODEL := "llamafactory/tiny-random-qwen3"   # qwen3 arch, ~6M params, smoke only
 TRAIN := "uv run python -m vgrout.train"          # real LeetCode GRPO entry point
 TEACHER_RT := "out/pools/teacher_pool_runtests_dense"   # dense single-mode run_tests pool
+# Teacher forcing: SYMMETRIC off-policy demos injected as ordinary gens (NOT specially
+# routed -- they pass through the same gate as student rollouts). STEP-LEVEL mix 0.5 over
+# 4 prompts x group 8 -> 16 teachers/step (8 hack + 8 solve), 16 students. Heavy on
+# purpose: the run is grad-starved (32 gens/step vs the paper's 256), so without strong
+# teacher forcing the student never reaches the hack (emerges ~ref-step 80-100). Teachers
+# stay on to step 60 (was 30) so the bootstrap has time to land before pure on-policy.
+# solve-teacher routed-share is a passive diagnostic (a good gate keeps them out of the
+# top tail), not enforcement.
+TEACH := "--mix-ratio=0.5 --solve-pool-dir=out/pools/teacher_pool_solve --solve-mix-frac=0.5 --teacher-off-step=60"

 default:
    @just --list
@@ -83,17 +92,22 @@ smoke-all:
 # pool, 50% unhackable, authored pairs). Every job carries a why:/resolve: label.
 # ─────────────────────────────────────────────────────────────────────────────

-# Headline 4-arm lora2r decision run. routeV real-v is the method; placebo (Haar)
+# Headline 5-arm lora2r decision run, ONLINE-STATS gate + teacher forcing ({{ TEACH }}).
+# real-v(k1) is the method; topk(k3) tries the multi-sub-mode subspace; placebo (Haar)
 # isolates directionality; vanilla is the emergence reference; absorb isolates the
 # gate+masks from absorption. Priority descending so they run in listed order.
-# --unhackable-frac is pinned EXPLICIT (not left to the default) so the headline
-# regime is self-documenting in the command line, not silently default-dependent.
+# --unhackable-frac pinned EXPLICIT so the regime is self-documenting, not default-dependent.
 # Decision: directionality is real iff real-v deploy_hack << placebo at matched solve.
+# Watch the streamed `auroc` col: ~0.5 = v_grad blind to live hacks (no gate works);
+# high + rout~0 = threshold problem; a drop at a refresh = the cliff is a direction problem.
+# NO inline eval (eval_ablate_every default 0): HF-generate-bound through 252 lora2r hooks
+# (~25-30 min/eval), so deploy is scored OFFLINE from the step-10 ckpts (`just results`).
 queue-decision seed='43':
-    pueue add -w "$PWD" -o 60 -l "why: P1 lora2r routeV REAL-v s{{seed}} (50% unhackable); resolve: deploy_hack << placebo at matched solve -> directionality real" -- {{ TRAIN }} fast --intervention=routeV --unhackable-frac=0.5 --seed={{seed}} --eval-ablate-every=20 --eval-n-prompts=32 --out-tag=_l2r_routeV_real_s{{seed}}
-    pueue add -w "$PWD" -o 58 -l "why: P2 lora2r routeV PLACEBO-v (Haar 157) s{{seed}} (50% unhackable); resolve: deploy_hack ~ vanilla -> real-v suppression is directional, not absorption/shrinkage" -- {{ TRAIN }} fast --intervention=routeV --routeV-random-v-seed=157 --unhackable-frac=0.5 --seed={{seed}} --eval-ablate-every=20 --eval-n-prompts=32 --out-tag=_l2r_routeV_placebo_s{{seed}}
-    pueue add -w "$PWD" -o 56 -l "why: P3 lora2r VANILLA (gate pinned clean) s{{seed}} (50% unhackable); resolve: deploy_hack >> 0 emergence reference on the identical adapter" -- {{ TRAIN }} fast --intervention=none --unhackable-frac=0.5 --seed={{seed}} --eval-ablate-every=20 --eval-n-prompts=32 --out-tag=_l2r_vanilla_s{{seed}}
-    pueue add -w "$PWD" -o 54 -l "why: P4 lora2r ABSORB (masks pinned (1,0), no gate) s{{seed}} (50% unhackable); resolve: ~vanilla -> gate+masks add nothing; << vanilla -> absorption alone suppresses" -- {{ TRAIN }} fast --intervention=absorb --unhackable-frac=0.5 --seed={{seed}} --eval-ablate-every=20 --eval-n-prompts=32 --out-tag=_l2r_absorb_s{{seed}}
+    pueue add -w "$PWD" -o 62 -l "why: P1 lora2r routeV REAL-v k1 online-stats + teacher-forcing s{{seed}} (50% unhackable); resolve: deploy_hack << placebo at matched solve -> directionality real" -- {{ TRAIN }} fast --intervention=routeV --unhackable-frac=0.5 {{ TEACH }} --seed={{seed}} --out-tag=_l2r_routeV_real_s{{seed}}
+    pueue add -w "$PWD" -o 60 -l "why: P2 lora2r routeV TOPK k3 online-stats + teacher-forcing s{{seed}} (50% unhackable); resolve: topk deploy_hack <= real-k1 -> sub-mode subspace catches hacks the mean washes out" -- {{ TRAIN }} fast --intervention=routeV --v-grad-k=3 --unhackable-frac=0.5 {{ TEACH }} --seed={{seed}} --out-tag=_l2r_routeV_topk_s{{seed}}
+    pueue add -w "$PWD" -o 58 -l "why: P3 lora2r routeV PLACEBO-v (Haar 157) + teacher-forcing s{{seed}} (50% unhackable); resolve: deploy_hack ~ vanilla -> real-v suppression is directional, not absorption/shrinkage" -- {{ TRAIN }} fast --intervention=routeV --routeV-random-v-seed=157 --unhackable-frac=0.5 {{ TEACH }} --seed={{seed}} --out-tag=_l2r_routeV_placebo_s{{seed}}
+    pueue add -w "$PWD" -o 56 -l "why: P4 lora2r VANILLA (gate pinned clean) + teacher-forcing s{{seed}} (50% unhackable); resolve: deploy_hack >> 0 emergence reference on the identical adapter" -- {{ TRAIN }} fast --intervention=none --unhackable-frac=0.5 {{ TEACH }} --seed={{seed}} --out-tag=_l2r_vanilla_s{{seed}}
+    pueue add -w "$PWD" -o 54 -l "why: P5 lora2r ABSORB (masks pinned (1,0), no gate) + teacher-forcing s{{seed}} (50% unhackable); resolve: ~vanilla -> gate+masks add nothing; << vanilla -> absorption alone suppresses" -- {{ TRAIN }} fast --intervention=absorb --unhackable-frac=0.5 {{ TEACH }} --seed={{seed}} --out-tag=_l2r_absorb_s{{seed}}

 # Base model zero-shot deploy eval (0 training steps): reproduce the paper's base
 # solve ~11.5% in our harness. resolve: base solve ~0.10-0.12.