evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-07-02 05:15:04 +08:00

Author	SHA1	Message	Date
wassname	0ea751c5bc	plot: #148 substrate emergence — per-mode hack-rate curves (4/5 learned, seed 41) New scripts/plot_substrate.py parses the hk_<mode> cumulative columns from a multi-loophole substrate run (one log, K interleaved modes) and draws one learning curve per mode with first_step onset dots and direct end-labels. plot_emergence.py can't do this (it groups logs by a single --env-mode). Figure shows the headline: vanilla GRPO learns file_marker/run_tests/ stdout_marker/sentinel, eq_override flat at 0 (never). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-30 16:49:03 +00:00
wassname	4621488cc0	reorg: out/ sorted by datatype (vhack/ pools/ runs/ vhack_grads/ figs/) Code writes+reads the new scheme; migrate_out_dirs.py moved 225 loose artifacts (0 left at top level). Per-run checkpoints+rollouts now group under runs/<ts>_<run_id>/ as train.safetensors/rollouts.jsonl. Figures land in out/figs/ with a stable docs/figs/<name>.png symlink (figs.link_latest). justfile also gains run-cell REFRESH param (online-erasure arm). Smoke + smoke-vanilla + results all green on new paths. Requeue manifest preserves the why/resolve labels that pueue reset wiped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 03:52:24 +00:00
wassname	4fb7b59548	plot(tufte): route-evidence figure — drop legend/grid/box, direct value labels Eraser test: direct end-labels carry name+value (no legend, no separate annotations, no gridlines). Layering: hack=red dominates (the story), solve= muted grey recedes. Range-frame left spine, minimal ticks (0/.5/1, 0/19). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 03:22:35 +00:00
wassname	c7e1500241	plot: add routing arm to dynamics_test overlay (matched mix=0.125/s41/20-step) vanilla ~0.65, static erasure ~0.65 (no benefit this seed), routing ship-model ~0.15. Matched config: erase+route both v_hack_21pairs, same vanilla baseline. n=1 -> 60-step T8 cells confirm. routing line = SHIP-eval (deployed, knob off). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 03:21:02 +00:00
wassname	3004b23f74	feat: route-evidence figure — training hack vs shipped hack (single run) Visualizes #182: model hacks ~0.9 during training (quarantine knob on) but the shipped model (knob deleted) stays ~0.25 at preserved solve ~0.5. The train-vs- ship gap is the routing benefit. Reads old(hack_abl)+new(hack_ship) ship cols. out/route_evidence_s41.png committed (-f, gitignored dir). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 03:17:35 +00:00

5 Commits