mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-07-04 22:45:42 +08:00
misc
This commit is contained in:
@@ -1,16 +1,17 @@
|
||||
"""All-arms per-mode DEPLOY overlay (#162) from the per_mode_deploy.json artifacts.
|
||||
|
||||
Each run writes out/runs/<ts>_<tag>/per_mode_deploy.json (train.py, #164) with the
|
||||
HONEST deploy numbers: for route/route2 the quarantine is deleted before eval, so
|
||||
this is the model you would actually ship -- unlike plot_substrate's hk_<mode>
|
||||
curves which are TRAIN-time (routed forward still hacks) and overstate routing.
|
||||
Each run writes out/runs/<ts>_<tag>/per_mode_deploy.json (train.py, #164) with
|
||||
deployment metrics. For route/route2, evaluation ablates the quarantine parameters.
|
||||
Unlike plot_substrate's training-time hk_<mode> curves, these metrics evaluate the
|
||||
deployed parameter state.
|
||||
|
||||
Reads JSON, not logs, so it never trips on a route2 arm the log-parsers don't know.
|
||||
|
||||
The headline comparison: per loophole mode, does each intervention suppress the
|
||||
DEPLOY hack rate below vanilla, and at what cost to DEPLOY solve? run_tests is the
|
||||
in-dist mode (v_hack built closest to it); the rest are held-out (the no-cheat
|
||||
generalisation test). Cleveland dot plot: y = mode, dot per arm, connector per
|
||||
in-distribution mode (v_hack built closest to it); the rest are held-out modes used
|
||||
to test generalization without training-distribution labels. Cleveland dot plot:
|
||||
y = mode, dot per arm, connector per
|
||||
mode so the vanilla -> route change reads as a line segment.
|
||||
|
||||
Usage:
|
||||
@@ -72,7 +73,7 @@ def _panel(ax, by_arm, modes, arms, field, xlabel):
|
||||
per mode, so the arm-to-arm change reads as a line segment (vanilla -> route).
|
||||
xerr = std across seeds (drawn only when >1 seed). Tufte: faint x-grid only, no
|
||||
box, dots+labels carry the categories.
|
||||
TODO(seeds): A5 ships n=1 (seed 41, jobs 103/104) so no error bar yet; the
|
||||
TODO(seeds): A5 currently has n=1 (seed 41, jobs 103/104) so no error bar yet; the
|
||||
queued seeds 42/43 (jobs 107-110) populate xerr -- the code already aggregates."""
|
||||
y = np.arange(len(modes))[::-1] # first mode at top
|
||||
for j in range(len(modes)): # arrow baseline->ours per mode: shows the DIRECTION of change
|
||||
|
||||
Reference in New Issue
Block a user