diff --git a/RESEARCH_JOURNAL.md b/RESEARCH_JOURNAL.md index 0502d32..b685fbe 100644 --- a/RESEARCH_JOURNAL.md +++ b/RESEARCH_JOURNAL.md @@ -59,6 +59,27 @@ the held-out-mode pairset. Rollouts: out/runs/20260603T202151_fast_vanilla_seed4 in teacher_modes and falls through to student-only (not skip) otherwise. The full pool can stay loaded; held-out demos simply never get sampled. Implement + smoke before queueing the A5 run. +### Implemented + queued (commit da48a95) + +- [obs] added `--teacher-modes` (train.py). Smoke-verified on tiny-random + substrate pool: + "teacher pool restricted 24->6", "loaded 24 problems" (all modes kept), run_tests prompts get + teacher rows (hack_t 2/2) while held-out prompts train student-only (hack_t 0/0, not skipped). + The end-of-run `delta_S_hack > 0` assert trips on tiny-random only because nothing ever hacks + -> nothing routes; on the real model run_tests hacks heavily so routing fires. Benign for smoke. +- [decision] STRICT A5 design chosen: teacher demos + tau anchor see ONLY run_tests; held-out + modes emerge purely on-policy. Most defensible no-cheat claim (a reviewer cannot say file_marker + was demonstrated to the model). Risk: file_marker may not emerge on-policy in 200 steps without + its teacher demo (job 95 needed teacher demos to reach 97 exploited in 40). Mitigation: a vanilla + baseline with the SAME teacher-modes=run_tests measures on-policy emergence -- if file_marker + stays ~0 in vanilla too, the test is inconclusive and we fall back to design-B (teacher seeds all + modes; v_grad + live gate still run_tests-only; weaker no-cheat but guarantees emergence). Do NOT + report route2 file_marker~0 without the baseline. +- [obs] queued (seed 41, 200 steps, eval-n-prompts=24 so per_mode_deploy covers all 4 modes): + job 102 extract v_grad from heldout_known_runtests.json (5 pairs) -> + out/vhack/v_hack_a5_runtests.safetensors; job 103 A5 vanilla baseline (none + teacher-modes); + job 104 A5 route2 test (route2 + teacher-modes + that v_grad, --after 102). A5 verdict = + file_marker deploy hack in 104 vs 103 at matched solve. #185 stays open until both land. + ## 2026-06-03 (e) — #187 resolved: vanilla-200 collapse was the hot preset, not long-horizon GRPO **Context:** Job 97 (gentle-preset vanilla-200 collapse probe). Job 85 had collapsed