feat(T4): symmetric solve-teacher pool + routed-share discrimination diagnostic

--solve-pool-dir splits the G_t teacher budget solve_mix_frac solve / rest hack
(default off). The gate's routed-share is split by teacher SOURCE: a discriminating
gate routes hack teachers (d->1) and KEEPS solve teachers (d->0); equal shares =
non-directional (shrinkage null). Teacher source is our pool construction, not a
live-rollout oracle label -- a legit diagnostic. Per-step debug + final BLUF
(hack-routed vs solve-routed gap, 🟢/🟡/🔴). _sample_rows helper dedups the draw.
Smoke: just smoke-solvemix green (split+diagnostic path runs end-to-end).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-10 12:02:39 +00:00
parent bf616749ee
commit 05a00aa487
3 changed files with 95 additions and 8 deletions
+11
View File
@@ -61,6 +61,17 @@ smoke-topk *ARGS:
--teacher-pool-dir=out/pools/teacher_pool --mix-ratio=0.5 \
--eval-ablate-every=10 --eval-n-prompts=2 {{ ARGS }}
# routeV + symmetric SOLVE-teacher pool: the G_t teacher slots split 50/50 solve/hack,
# and the run logs the routed-share discrimination (UAT: a line "solve-mix gate
# discrimination: hack-teacher routed-share=X vs solve-teacher routed-share=Y"). Smoke
# points solve at the same tiny pool just to exercise the split+diagnostic path; real
# runs use out/pools/teacher_pool_solve (honest demos) vs the hack pool.
smoke-solvemix *ARGS:
BEARTYPE=1 {{ TRAIN }} smoke --intervention=routeV \
--teacher-pool-dir=out/pools/teacher_pool --solve-pool-dir=out/pools/teacher_pool \
--mix-ratio=0.5 --solve-mix-frac=0.5 \
--eval-ablate-every=10 --eval-n-prompts=2 {{ ARGS }}
# All three arms back to back (the full-coverage gate).
smoke-all:
just smoke-vanilla