log: rename route ablated-eval -> SHIP-eval, self-evident with SHOULD

User couldn't parse 'route ablated-eval (delta_S_hack=0)'. Common terms now: - per-step line: 'SHIP-eval (quarantine knob OFF = deployed model)' + inline SHOULD (ship hack < training hack_s if the knob holds the cheat). - columns hack_abl/solve_abl -> hack_ship/solve_ship. - final BLUF: 'train/knob-on' vs 'ship/knob-off' + SHOULD (quarantine absorbed the cheat). plot_dynamics accepts both old+new names. smoke-route green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 19:47:33 +08:00 · 2026-05-30 03:01:04 +00:00
parent 969c724d9d
commit 4f9651b1f3
2 changed files with 43 additions and 34 deletions
@@ -20,9 +20,9 @@ Arm classification (from the preset line `arm=`, covering old --arm and new
  online erasure     arm=projected, --vhack-refresh-every=N>0 (re-extracted)
  routing            arm=routing    (intervention=route)

-For routing we plot the ABLATED-eval hack/solve (hack_abl/solve_abl, measured
-with delta_S_hack zeroed every --eval-ablate-every steps), NOT the training-time
-hack_s: the routed forward still hacks during training, so the training curve
+For routing we plot the SHIP-eval hack/solve (hack_ship/solve_ship, the deployed
+model = quarantine knob deleted, measured every --eval-ablate-every steps), NOT
+the training-time hack_s: the routed forward still hacks during training, so the training curve
 would falsely read "route doesn't work". The ablated curve is the deployment
 model. (none/erase plot training-time hack_s; their intervention acts at train
 time.)
@@ -91,10 +91,12 @@ def parse_log(path: Path) -> dict | None:

    series: dict[str, list[float]] = defaultdict(list)
    steps: list[int] = []
-    # Also parse the route ablated-eval columns when present (older logs lack
-    # them -> skip). For routing we plot THESE, not the training-time hack_s.
-    abl = {"hack_abl", "solve_abl"} & set(idx)
-    wanted = {**RATE_COLS, **COS_COLS, **{c: c for c in abl}}
+    # Also parse the route SHIP-eval columns when present (older logs lack them
+    # -> skip). For routing we plot THESE (deployed model), not training-time
+    # hack_s. Renamed hack_abl/solve_abl -> hack_ship/solve_ship 2026-05-30;
+    # accept both so old evidence logs still parse.
+    ship = {"hack_abl", "solve_abl", "hack_ship", "solve_ship"} & set(idx)
+    wanted = {**RATE_COLS, **COS_COLS, **{c: c for c in ship}}
    for line in txt.splitlines():
        if "| INFO |" not in line:
            continue
@@ -109,12 +111,14 @@ def parse_log(path: Path) -> dict | None:
    run = dict(arm=arm, refr=refr, seed=seed, vhack=vhack,
               steps=np.array(steps), **{k: np.array(v, dtype=float) for k, v in series.items()})
    # COHERENCE-GAP FIX: route's training-time hack_s looks vanilla (the routed
-    # forward still hacks); routing's benefit only shows once delta_S_hack is
-    # ablated at eval. So for routing, plot the ablated series under the same
+    # forward still hacks); routing's benefit only shows on the DEPLOYED model
+    # (quarantine knob deleted). So for routing, plot the ship series under the
    # hack_s/gt_s keys -> all downstream (panels, onset, overlay) reads it.
-    if arm == "routing" and "hack_abl" in run:
-        run["hack_s"] = run["hack_abl"]
-        run["gt_s"] = run["solve_abl"]
+    if arm == "routing":
+        hk = "hack_ship" if "hack_ship" in run else "hack_abl" if "hack_abl" in run else None
+        if hk:
+            run["hack_s"] = run["hack_ship" if "hack_ship" in run else "hack_abl"]
+            run["gt_s"] = run["solve_ship" if "solve_ship" in run else "solve_abl"]
    return run