journal: A5 run plan queued (strict teacher-modes=run_tests, vanilla baseline + route2 test)

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 16:45:42 +08:00 · 2026-06-03 22:46:56 +00:00
parent da48a95d9e
commit 0913b064fc
1 changed files with 21 additions and 0 deletions
@@ -59,6 +59,27 @@ the held-out-mode pairset. Rollouts: out/runs/20260603T202151_fast_vanilla_seed4
  in teacher_modes and falls through to student-only (not skip) otherwise. The full pool can stay
  loaded; held-out demos simply never get sampled. Implement + smoke before queueing the A5 run.

+### Implemented + queued (commit da48a95)
+
+- [obs] added `--teacher-modes` (train.py). Smoke-verified on tiny-random + substrate pool:
+  "teacher pool restricted 24->6", "loaded 24 problems" (all modes kept), run_tests prompts get
+  teacher rows (hack_t 2/2) while held-out prompts train student-only (hack_t 0/0, not skipped).
+  The end-of-run `delta_S_hack > 0` assert trips on tiny-random only because nothing ever hacks
+  -> nothing routes; on the real model run_tests hacks heavily so routing fires. Benign for smoke.
+- [decision] STRICT A5 design chosen: teacher demos + tau anchor see ONLY run_tests; held-out
+  modes emerge purely on-policy. Most defensible no-cheat claim (a reviewer cannot say file_marker
+  was demonstrated to the model). Risk: file_marker may not emerge on-policy in 200 steps without
+  its teacher demo (job 95 needed teacher demos to reach 97 exploited in 40). Mitigation: a vanilla
+  baseline with the SAME teacher-modes=run_tests measures on-policy emergence -- if file_marker
+  stays ~0 in vanilla too, the test is inconclusive and we fall back to design-B (teacher seeds all
+  modes; v_grad + live gate still run_tests-only; weaker no-cheat but guarantees emergence). Do NOT
+  report route2 file_marker~0 without the baseline.
+- [obs] queued (seed 41, 200 steps, eval-n-prompts=24 so per_mode_deploy covers all 4 modes):
+  job 102 extract v_grad from heldout_known_runtests.json (5 pairs) ->
+  out/vhack/v_hack_a5_runtests.safetensors; job 103 A5 vanilla baseline (none + teacher-modes);
+  job 104 A5 route2 test (route2 + teacher-modes + that v_grad, --after 102). A5 verdict =
+  file_marker deploy hack in 104 vs 103 at matched solve. #185 stays open until both land.
+
 ## 2026-06-03 (e) — #187 resolved: vanilla-200 collapse was the hot preset, not long-horizon GRPO

 **Context:** Job 97 (gentle-preset vanilla-200 collapse probe). Job 85 had collapsed