results.md: fix regenerate ref (just results-deploy -> just results)

This commit is contained in:
wassname
2026-06-09 01:51:28 +00:00
parent 5007c9757a
commit 0412dc56d1
+2 -2
View File
@@ -2,7 +2,7 @@
Deploy-eval is the headline metric: knob-off forward on the recency-clean held-out
TEST set (ids>=3243, base solve ~0.1, n=119), single-mode `run_tests` env, Qwen3-4B.
Regenerate the table with `just results-deploy` (scripts/results_deploy.py, auto-discovers
Regenerate the table with `just results` (scripts/results_deploy.py, auto-discovers
every `out/runs/*/deploy_test.json`); `just results` gives the live training-hack table.
OLD eval1 results (training-hack metric, the Q1-Q13 mechanism/basis/refresh studies on the
@@ -27,7 +27,7 @@ pre-recency-clean eval) are archived in [results_eval1_archive.md](results_eval1
for vanilla), eval_set=test = recency-clean held-out ids>=3243 (base solve ~0.1), n=119,
60-step fast preset, Qwen3-4B, single-mode run_tests env, seed 43. NOT comparable to Q12
(old n=64 eval, pre the 2026-05-23 grader-bug / recency-clean fix that moved base solve
0.94->0.1). REGENERATE: `just results-deploy` (scripts/results_deploy.py) auto-discovers
0.94->0.1). REGENERATE: `just results` (scripts/results_deploy.py) auto-discovers
every out/runs/*/deploy_test.json -- this table is a curated copy of that output.
Smoke runs (seed 41, steps 30, tiny-random, hack=0) are excluded.
completed src: _dir6_routeV_s43 (job 8) / _dir6_routeV_pertoken_s43 (job 9) /