reorg: out/ sorted by datatype (vhack/ pools/ runs/ vhack_grads/ figs/)

Code writes+reads the new scheme; migrate_out_dirs.py moved 225 loose artifacts (0 left at top level). Per-run checkpoints+rollouts now group under runs/<ts>_<run_id>/ as train.safetensors/rollouts.jsonl. Figures land in out/figs/ with a stable docs/figs/<name>.png symlink (figs.link_latest). justfile also gains run-cell REFRESH param (online-erasure arm). Smoke + smoke-vanilla + results all green on new paths. Requeue manifest preserves the why/resolve labels that pueue reset wiped. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-27 17:30:41 +08:00 · 2026-05-30 03:52:24 +00:00
parent 4fb7b59548
commit 4621488cc0
19 changed files with 296 additions and 75 deletions
@@ -0,0 +1,81 @@
+# Requeue manifest (queue reset 2026-05-30)
+
+Durable copy of the pueue why/resolve labels + commands, captured before
+`pueue reset` wiped them. This file is the requeue plan. Raw JSON snapshot:
+`/tmp/claude-1000/queue_snapshot_033450.json` (not durable — this .md is).
+
+## Dependency note (env reframe 2026-05-30)
+The 4 dynamics-plot arms (none/static-erase/online-erase/route) are to be
+rebuilt on the NEW expose-K split env, not the original run_tests env. That
+env does not exist yet: it is gated on T0 (#183, M2>0 reachability) and then
+needs T1/T2/T4 built (Stage 2 #132). So the plot arms below are commands for
+the ORIGINAL env and must be re-pathed to the split env once it is built.
+Original-env why-jobs (marathon #152, prog_widest #181, G2 screens, defer)
+can requeue immediately and use new out/ paths after the reorg (#131).
+
+### #184 (prio 90)
+- why: T8 KEY GOAL seed-41 cell intervention=none (60-step head-to-head); resolve: route/erase ship-hack < none at matched solve => projection beats vanilla; feeds dynamics plot
+  ```
+  just run-cell none 41
+  ```
+
+### #185 (prio 90)
+- why: T8 KEY GOAL seed-41 cell intervention=erase (60-step head-to-head); resolve: route/erase ship-hack < none at matched solve => projection beats vanilla; feeds dynamics plot
+  ```
+  just run-cell erase 41
+  ```
+
+### #186 (prio 90)
+- why: T8 KEY GOAL seed-41 cell intervention=route (60-step head-to-head); resolve: route/erase ship-hack < none at matched solve => projection beats vanilla; feeds dynamics plot
+  ```
+  just run-cell route 41
+  ```
+
+### #187 (prio 90)
+- why: T8 overlay missing the ONLINE-erasure arm (refresh-2) at matched mix=0.125/s41/60-step — user wants it back in dynamics overlay; resolve: 4-arm overlay none/static-erase/online-erase/route, all seed-41, shows whether refresh keeps hack_s down longer than static
+  ```
+  just run-cell erase 41 2
+  ```
+
+### #181 (prio 40)
+- why: does v_hack from 'prog_widest' suppress mechanical LeetCode hack at matched solve, seed41 frozen; resolve: L5_hack vs vanilla #153 (0.664), prog_wide #156 (0.500)
+  ```
+  uv run python -m projected_grpo.train fast --teacher-pool-dir=out/probe_distill/teacher_pool --grad-clip=500 --seed=41 --intervention=erase --v-hack-path=out/v_hack_pairset_prog_widest.safetensors --out-tag=_pairset_prog_widest_s41
+  ```
+
+### #183 (prio 8)
+- why: T0 de-risk — does Qwen3-4B hardcode K visible tests under expose-K nudge; resolve: M2 rate >0 => expose-K env reachable, proceed T1; ~0 => STOP rethink env
+  ```
+  uv run python scripts/derisk_expose_k.py --model=Qwen/Qwen3-4B --n-problems=24 --group=8 --k-visible=2 --seed=41
+  ```
+
+### #152 (prio 1)
+- why: low mix + high refresh, LONG horizon — different equilibrium than full-hack? (v_hack_21pairs, refresh-2 k=12, mix0125, s42, 1000 steps); resolve: hack_s<~0.7 and/or solve_s>~0.2 at large step => new attractor [demoted to bg prio so short jobs run first]
+  ```
+  just fast-projected --v-hack-path=out/v_hack_21pairs.safetensors --vhack-refresh-every=2 --seed=42 --steps=1000 --out-tag=_equilib_refresh2_k12_mix0125_1000_s42
+  ```
+
+### #137 (prio 0)
+- why: G2 screen rl-rewardhacking-leetcode-gt-monitor-screening-s65 (5-prompt batch); resolve: per-checkpoint E/C/D% table to find non-rh-s65 detector clusters
+  ```
+  just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-gt-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-gt-monitor-screening-s65 5
+  ```
+
+### #138 (prio 0)
+- why: G2 screen rl-rewardhacking-leetcode-judge-monitor-screening-s65 (5-prompt batch); resolve: per-checkpoint E/C/D% table to find non-rh-s65 detector clusters
+  ```
+  just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-judge-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-judge-monitor-screening-s65 5
+  ```
+
+### #139 (prio 0)
+- why: G2 screen rl-rewardhacking-leetcode-probe-monitor-screening-s65 (5-prompt batch); resolve: per-checkpoint E/C/D% table to find non-rh-s65 detector clusters
+  ```
+  just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-probe-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-probe-monitor-screening-s65 5
+  ```
+
+### #173 (prio -10)
+- why: does Qwen3.6-27B defer under DEFER_PERSONA on blatant authority prompts (w2schar-mini); resolve: [DEFER] gens COMPLY not refuse => wire persona-gen into prepare_round; else need another deferring-anchor source
+  ```
+  uv run python scripts/validate_defer_persona.py
+  ```
+