Commit Graph

5 Commits

Author SHA1 Message Date
wassname 0ea751c5bc plot: #148 substrate emergence — per-mode hack-rate curves (4/5 learned, seed 41)
New scripts/plot_substrate.py parses the hk_<mode> cumulative columns from a
multi-loophole substrate run (one log, K interleaved modes) and draws one
learning curve per mode with first_step onset dots and direct end-labels.
plot_emergence.py can't do this (it groups logs by a single --env-mode).
Figure shows the headline: vanilla GRPO learns file_marker/run_tests/
stdout_marker/sentinel, eq_override flat at 0 (never).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-05-30 16:49:03 +00:00
wassname 4621488cc0 reorg: out/ sorted by datatype (vhack/ pools/ runs/ vhack_grads/ figs/)
Code writes+reads the new scheme; migrate_out_dirs.py moved 225 loose artifacts
(0 left at top level). Per-run checkpoints+rollouts now group under
runs/<ts>_<run_id>/ as train.safetensors/rollouts.jsonl. Figures land in
out/figs/ with a stable docs/figs/<name>.png symlink (figs.link_latest).
justfile also gains run-cell REFRESH param (online-erasure arm). Smoke +
smoke-vanilla + results all green on new paths. Requeue manifest preserves the
why/resolve labels that pueue reset wiped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 03:52:24 +00:00
wassname 4fb7b59548 plot(tufte): route-evidence figure — drop legend/grid/box, direct value labels
Eraser test: direct end-labels carry name+value (no legend, no separate
annotations, no gridlines). Layering: hack=red dominates (the story), solve=
muted grey recedes. Range-frame left spine, minimal ticks (0/.5/1, 0/19).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 03:22:35 +00:00
wassname c7e1500241 plot: add routing arm to dynamics_test overlay (matched mix=0.125/s41/20-step)
vanilla ~0.65, static erasure ~0.65 (no benefit this seed), routing ship-model
~0.15. Matched config: erase+route both v_hack_21pairs, same vanilla baseline.
n=1 -> 60-step T8 cells confirm. routing line = SHIP-eval (deployed, knob off).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 03:21:02 +00:00
wassname 3004b23f74 feat: route-evidence figure — training hack vs shipped hack (single run)
Visualizes #182: model hacks ~0.9 during training (quarantine knob on) but the
shipped model (knob deleted) stays ~0.25 at preserved solve ~0.5. The train-vs-
ship gap is the routing benefit. Reads old(hack_abl)+new(hack_ship) ship cols.
out/route_evidence_s41.png committed (-f, gitignored dir).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 03:17:35 +00:00