refactor: extract train_config.py + run_artifacts.py from train.py; slim results scripts

Cleanup by a prior agent, verified green here: 'just smoke' (erase arm)
runs end-to-end and all four wired gates pass (verify_rewards 52/52,
verify_eval_gap, verify_partition, verify_science_invariants).

- train.py -318 lines: Config dataclass -> train_config.py, checkpoint/
  deploy-artifact IO -> run_artifacts.py.
- results.py / results_deploy.py / probe_distill.py slimmed.
- drop stale derived csvs under out/figs (a5_generalisation, dyn_*,
  substrate_aggregate, train_vs_deploy_60).
- gitignore /.pi/ panel scratch.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-09 13:34:50 +00:00
parent 3f82041d90
commit b53043cec3
31 changed files with 673 additions and 3073 deletions
+4
View File
@@ -84,6 +84,10 @@ def main() -> int:
snap = copy.deepcopy(prob["messages"])
msgs, _ = randomize_eval_markers(prob)
mut_ok &= (prob["messages"] == snap) and (msgs != snap) # prob untouched; returned copy changed
gt_prob = {"env_mode": "gt_only", "problem_id": 7,
"messages": [{"role": "user", "content": f"Solve it {HINT_REPLACE_TO['gt_only']} thanks"}]}
gt_msgs, gt_kw = randomize_eval_markers(gt_prob)
mut_ok &= gt_msgs == gt_prob["messages"] and gt_kw == {}
ok &= mut_ok
logger.info(f"no-mutation: {'PASS' if mut_ok else 'FAIL'}")
logger.info("PASS: eval-token gap (v2 disjoint families) holds for all 4 modes" if ok else "FAIL: gap broken")