results.md: fix regenerate ref (just results-deploy -> just results)

2026-06-27 16:15:35 +08:00 · 2026-06-09 01:51:28 +00:00
parent 5007c9757a
commit 0412dc56d1
1 changed files with 2 additions and 2 deletions
@@ -2,7 +2,7 @@

 Deploy-eval is the headline metric: knob-off forward on the recency-clean held-out
 TEST set (ids>=3243, base solve ~0.1, n=119), single-mode `run_tests` env, Qwen3-4B.
-Regenerate the table with `just results-deploy` (scripts/results_deploy.py, auto-discovers
+Regenerate the table with `just results` (scripts/results_deploy.py, auto-discovers
 every `out/runs/*/deploy_test.json`); `just results` gives the live training-hack table.

 OLD eval1 results (training-hack metric, the Q1-Q13 mechanism/basis/refresh studies on the
@@ -27,7 +27,7 @@ pre-recency-clean eval) are archived in [results_eval1_archive.md](results_eval1
     for vanilla), eval_set=test = recency-clean held-out ids>=3243 (base solve ~0.1), n=119,
     60-step fast preset, Qwen3-4B, single-mode run_tests env, seed 43. NOT comparable to Q12
     (old n=64 eval, pre the 2026-05-23 grader-bug / recency-clean fix that moved base solve
-     0.94->0.1). REGENERATE: `just results-deploy` (scripts/results_deploy.py) auto-discovers
+     0.94->0.1). REGENERATE: `just results` (scripts/results_deploy.py) auto-discovers
     every out/runs/*/deploy_test.json -- this table is a curated copy of that output.
     Smoke runs (seed 41, steps 30, tiny-random, hack=0) are excluded.
     completed src: _dir6_routeV_s43 (job 8) / _dir6_routeV_pertoken_s43 (job 9) /