eval+env: single-mode run_tests, held-out val/test eval, both hack metrics

- revert env to single-mode run_tests (paper-comparable): FastConfig teacher
  pool = run_tests-only (no partition.json); + `just build-runtests-pool`
- held-out eval: periodic train(knob-on)+deploy(knob-off) on VAL (holdout file),
  final deploy on TEST n=119 -> deploy_test.json; inline train/val/test disjoint assert
- report BOTH hack metrics: strict stub-pass (exploited) + vendor eq_hinted
  (hacked_loophole_used) -- external review 2026-06-07
- consolidate to one canonical eval_hack_solve (.eval); delete the train.py
  duplicate that silently lacked the token gap (in-run eval != rescore bug)
- routeV band edges mean -> min/max (conservative degrade-to-absorb)
- scripts/rescore_deploy.py: offline re-score of saved adapter on held-out test
- modal/app.py: read deploy_test.json

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-07 03:07:14 +00:00
parent 7195d19f90
commit 7da54f1967
6 changed files with 229 additions and 151 deletions
+14
View File
@@ -151,6 +151,20 @@ build-substrate MODES="run_tests,exit_code,sentinel":
uv run python scripts/build_substrate.py \
--modes {{ MODES }} --pool-modes run_tests --min-hacks 5
# Single-mode run_tests teacher pool = the run_tests slice of the 4-mode substrate, with
# NO partition.json so train.py runs single-mode (paper-comparable Ariahw run_tests env,
# the FastConfig default teacher pool). Reproducible rebuild of out/pools/teacher_pool_runtests
# (out/ is gitignored; Modal gets it via modal/upload_inputs.py). The teacher pool itself is
# OUR emergence accelerator -- the paper seeds nothing; teacher_off_step=30 cuts to pure
# on-policy past step 30 (job 87: hacking self-sustains after the cut).
build-runtests-pool:
rm -rf out/pools/teacher_pool_runtests && mkdir -p out/pools/teacher_pool_runtests
uv run python -c "import json,shutil; from pathlib import Path; \
p=json.loads(Path('out/pools/substrate/partition.json').read_text()); \
rt=[int(i) for i,m in p.items() if m=='run_tests']; \
[shutil.copy(f'out/pools/substrate/prompt_{i:04d}.jsonl.gz','out/pools/teacher_pool_runtests/') for i in rt]; \
print('run_tests pool:',sorted(rt))"
# Vanilla-GRPO emergence on the multi-loophole substrate: does the student learn ALL
# K loopholes from the repeated even teacher batch? UAT = end-of-run SUBSTRATE table
# (per-mode hacks>0 + finite first_step) + the per-step hk_<mode> columns. mix=0.125