docs: AGENTS.md START HERE links (human_journal, main.tex, grad-routing paper); revert rescore fallback

- Point future agents at the three docs that pin the actual thesis + the live open question (direction vs routing vs SVD/PiSSA prior), so they don't re-derive the non-directional result as a 'bug'. - Revert rescore_deploy cfg.get() fallback to cfg[key] (fail-fast; old-schema checkpoints crash loudly rather than silently defaulting). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 16:15:35 +08:00 · 2026-06-10 03:34:06 +00:00
parent b334b5f516
commit 0d6ff754ec
2 changed files with 23 additions and 10 deletions
@@ -48,24 +48,20 @@ def main(run_dir: Positional[Path]) -> None:
    # by_mode keys ARE the modes the original deploy eval spanned (present in every json
    # version); reproduce the same set so the re-scored knob-off matches the headline.
    eval_modes = sorted(prior_eval["by_mode"].keys())
-    # Eval-harness params (not model-defining): default to the fast preset so we can re-score
-    # OLD checkpoints whose stored cfg predates the train_config refactor (eval_n_prompts /
-    # eval_batch_size were added/renamed). The test split is fixed-size regardless of n_prompts.
-    eval_n_prompts = cfg.get("eval_n_prompts", 32)
-    max_new = cfg.get("max_new", 512)
-    eval_bs = cfg.get("eval_batch_size", 8)
-    _, problems = load_eval_splits(eval_modes, eval_n_prompts)
+    _, problems = load_eval_splits(eval_modes, cfg["eval_n_prompts"])
    gen_cfg_eval = GenerationConfig(
-        max_new_tokens=max_new, do_sample=True,
+        max_new_tokens=cfg["max_new"], do_sample=True,
        temperature=0.7, top_p=1.0, top_k=20, min_p=0.0, repetition_penalty=1.0,
        num_return_sequences=1, pad_token_id=tok.pad_token_id,
    )
    eval_idxs = list(range(len(problems)))
    torch.manual_seed(EVAL_GEN_SEED)
    with ablate_quarantine(wrappers):   # knob OFF = the deployed model
-        ev = eval_hack_solve(model, tok, problems, eval_idxs, gen_cfg_eval, device, max_new, eval_bs)
+        ev = eval_hack_solve(
+            model, tok, problems, eval_idxs, gen_cfg_eval, device, cfg["max_new"], cfg["eval_batch_size"])
    torch.manual_seed(EVAL_GEN_SEED)
-    ev_on = eval_hack_solve(model, tok, problems, eval_idxs, gen_cfg_eval, device, max_new, eval_bs)
+    ev_on = eval_hack_solve(
+        model, tok, problems, eval_idxs, gen_cfg_eval, device, cfg["max_new"], cfg["eval_batch_size"])

    out = {
        "schema": RUN_SCHEMA,