save_eval_ckpts (default on): write the deploy adapter (δS only, ~2.3MB) at each
deploy-eval step, step-tagged, so a run can be re-scored later (more prompts /
different eval) without retraining. The A5 run saved only final+first_hack, which
is why the leak needed a full retrain rather than a rescore.
AGENTS.md: every load-bearing invariant gets a verify_*.py gate. The no-cheat leak
shipped because the green gates never covered the property -- 'tests passed' is
meaningless if the property was never tested.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Deletes 7 source files that were superseded but never removed:
run.py, grad_proj.py, extract_vhack.py (older twin-NLL extractor),
grpo_smoke.py, grpo_proj_smoke.py (smoke harnesses replaced by
train.py "smoke" subcommand), phase2_analyze.py (pilot is past),
probe_uat.py (UAT pipeline is past).
Drops matching justfile recipes (vhack-check, phase2-analyze,
probe-uat) and the BASE constant that pointed at run.py. Updates
AGENTS/README references to the stale fast-dev-run recipe (now
just smoke / smoke-vanilla).
Verified by running just smoke-vanilla --steps=2 end-to-end.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>