mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:15:35 +08:00
results.md: fix regenerate ref (just results-deploy -> just results)
This commit is contained in:
+2
-2
@@ -2,7 +2,7 @@
|
||||
|
||||
Deploy-eval is the headline metric: knob-off forward on the recency-clean held-out
|
||||
TEST set (ids>=3243, base solve ~0.1, n=119), single-mode `run_tests` env, Qwen3-4B.
|
||||
Regenerate the table with `just results-deploy` (scripts/results_deploy.py, auto-discovers
|
||||
Regenerate the table with `just results` (scripts/results_deploy.py, auto-discovers
|
||||
every `out/runs/*/deploy_test.json`); `just results` gives the live training-hack table.
|
||||
|
||||
OLD eval1 results (training-hack metric, the Q1-Q13 mechanism/basis/refresh studies on the
|
||||
@@ -27,7 +27,7 @@ pre-recency-clean eval) are archived in [results_eval1_archive.md](results_eval1
|
||||
for vanilla), eval_set=test = recency-clean held-out ids>=3243 (base solve ~0.1), n=119,
|
||||
60-step fast preset, Qwen3-4B, single-mode run_tests env, seed 43. NOT comparable to Q12
|
||||
(old n=64 eval, pre the 2026-05-23 grader-bug / recency-clean fix that moved base solve
|
||||
0.94->0.1). REGENERATE: `just results-deploy` (scripts/results_deploy.py) auto-discovers
|
||||
0.94->0.1). REGENERATE: `just results` (scripts/results_deploy.py) auto-discovers
|
||||
every out/runs/*/deploy_test.json -- this table is a curated copy of that output.
|
||||
Smoke runs (seed 41, steps 30, tiny-random, hack=0) are excluded.
|
||||
completed src: _dir6_routeV_s43 (job 8) / _dir6_routeV_pertoken_s43 (job 9) /
|
||||
|
||||
Reference in New Issue
Block a user