New scripts/plot_substrate.py parses the hk_<mode> cumulative columns from a
multi-loophole substrate run (one log, K interleaved modes) and draws one
learning curve per mode with first_step onset dots and direct end-labels.
plot_emergence.py can't do this (it groups logs by a single --env-mode).
Figure shows the headline: vanilla GRPO learns file_marker/run_tests/
stdout_marker/sentinel, eq_override flat at 0 (never).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Code writes+reads the new scheme; migrate_out_dirs.py moved 225 loose artifacts
(0 left at top level). Per-run checkpoints+rollouts now group under
runs/<ts>_<run_id>/ as train.safetensors/rollouts.jsonl. Figures land in
out/figs/ with a stable docs/figs/<name>.png symlink (figs.link_latest).
justfile also gains run-cell REFRESH param (online-erasure arm). Smoke +
smoke-vanilla + results all green on new paths. Requeue manifest preserves the
why/resolve labels that pueue reset wiped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Eraser test: direct end-labels carry name+value (no legend, no separate
annotations, no gridlines). Layering: hack=red dominates (the story), solve=
muted grey recedes. Range-frame left spine, minimal ticks (0/.5/1, 0/19).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Visualizes #182: model hacks ~0.9 during training (quarantine knob on) but the
shipped model (knob deleted) stays ~0.25 at preserved solve ~0.5. The train-vs-
ship gap is the routing benefit. Reads old(hack_abl)+new(hack_ship) ship cols.
out/route_evidence_s41.png committed (-f, gitignored dir).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>