misc

2026-07-04 22:45:42 +08:00 · 2026-06-11 11:07:28 +00:00
parent 7871aa66b8
commit 270c4f5a27
30 changed files with 456 additions and 443 deletions
@@ -1,16 +1,17 @@
 """All-arms per-mode DEPLOY overlay (#162) from the per_mode_deploy.json artifacts.

-Each run writes out/runs/<ts>_<tag>/per_mode_deploy.json (train.py, #164) with the
-HONEST deploy numbers: for route/route2 the quarantine is deleted before eval, so
-this is the model you would actually ship -- unlike plot_substrate's hk_<mode>
-curves which are TRAIN-time (routed forward still hacks) and overstate routing.
+Each run writes out/runs/<ts>_<tag>/per_mode_deploy.json (train.py, #164) with
+deployment metrics. For route/route2, evaluation ablates the quarantine parameters.
+Unlike plot_substrate's training-time hk_<mode> curves, these metrics evaluate the
+deployed parameter state.

 Reads JSON, not logs, so it never trips on a route2 arm the log-parsers don't know.

 The headline comparison: per loophole mode, does each intervention suppress the
 DEPLOY hack rate below vanilla, and at what cost to DEPLOY solve? run_tests is the
-in-dist mode (v_hack built closest to it); the rest are held-out (the no-cheat
-generalisation test). Cleveland dot plot: y = mode, dot per arm, connector per
+in-distribution mode (v_hack built closest to it); the rest are held-out modes used
+to test generalization without training-distribution labels. Cleveland dot plot:
+y = mode, dot per arm, connector per
 mode so the vanilla -> route change reads as a line segment.

 Usage:
@@ -72,7 +73,7 @@ def _panel(ax, by_arm, modes, arms, field, xlabel):
    per mode, so the arm-to-arm change reads as a line segment (vanilla -> route).
    xerr = std across seeds (drawn only when >1 seed). Tufte: faint x-grid only, no
    box, dots+labels carry the categories.
-    TODO(seeds): A5 ships n=1 (seed 41, jobs 103/104) so no error bar yet; the
+    TODO(seeds): A5 currently has n=1 (seed 41, jobs 103/104) so no error bar yet; the
    queued seeds 42/43 (jobs 107-110) populate xerr -- the code already aggregates."""
    y = np.arange(len(modes))[::-1]                  # first mode at top
    for j in range(len(modes)):                      # arrow baseline->ours per mode: shows the DIRECTION of change