mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:45:42 +08:00
eval+env: single-mode run_tests, held-out val/test eval, both hack metrics
- revert env to single-mode run_tests (paper-comparable): FastConfig teacher pool = run_tests-only (no partition.json); + `just build-runtests-pool` - held-out eval: periodic train(knob-on)+deploy(knob-off) on VAL (holdout file), final deploy on TEST n=119 -> deploy_test.json; inline train/val/test disjoint assert - report BOTH hack metrics: strict stub-pass (exploited) + vendor eq_hinted (hacked_loophole_used) -- external review 2026-06-07 - consolidate to one canonical eval_hack_solve (.eval); delete the train.py duplicate that silently lacked the token gap (in-run eval != rescore bug) - routeV band edges mean -> min/max (conservative degrade-to-absorb) - scripts/rescore_deploy.py: offline re-score of saved adapter on held-out test - modal/app.py: read deploy_test.json Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -151,6 +151,20 @@ build-substrate MODES="run_tests,exit_code,sentinel":
|
||||
uv run python scripts/build_substrate.py \
|
||||
--modes {{ MODES }} --pool-modes run_tests --min-hacks 5
|
||||
|
||||
# Single-mode run_tests teacher pool = the run_tests slice of the 4-mode substrate, with
|
||||
# NO partition.json so train.py runs single-mode (paper-comparable Ariahw run_tests env,
|
||||
# the FastConfig default teacher pool). Reproducible rebuild of out/pools/teacher_pool_runtests
|
||||
# (out/ is gitignored; Modal gets it via modal/upload_inputs.py). The teacher pool itself is
|
||||
# OUR emergence accelerator -- the paper seeds nothing; teacher_off_step=30 cuts to pure
|
||||
# on-policy past step 30 (job 87: hacking self-sustains after the cut).
|
||||
build-runtests-pool:
|
||||
rm -rf out/pools/teacher_pool_runtests && mkdir -p out/pools/teacher_pool_runtests
|
||||
uv run python -c "import json,shutil; from pathlib import Path; \
|
||||
p=json.loads(Path('out/pools/substrate/partition.json').read_text()); \
|
||||
rt=[int(i) for i,m in p.items() if m=='run_tests']; \
|
||||
[shutil.copy(f'out/pools/substrate/prompt_{i:04d}.jsonl.gz','out/pools/teacher_pool_runtests/') for i in rt]; \
|
||||
print('run_tests pool:',sorted(rt))"
|
||||
|
||||
# Vanilla-GRPO emergence on the multi-loophole substrate: does the student learn ALL
|
||||
# K loopholes from the repeated even teacher batch? UAT = end-of-run SUBSTRATE table
|
||||
# (per-mode hacks>0 + finite first_step) + the per-step hk_<mode> columns. mix=0.125
|
||||
|
||||
Reference in New Issue
Block a user