reorg: out/ sorted by datatype (vhack/ pools/ runs/ vhack_grads/ figs/)

Code writes+reads the new scheme; migrate_out_dirs.py moved 225 loose artifacts
(0 left at top level). Per-run checkpoints+rollouts now group under
runs/<ts>_<run_id>/ as train.safetensors/rollouts.jsonl. Figures land in
out/figs/ with a stable docs/figs/<name>.png symlink (figs.link_latest).
justfile also gains run-cell REFRESH param (online-erasure arm). Smoke +
smoke-vanilla + results all green on new paths. Requeue manifest preserves the
why/resolve labels that pueue reset wiped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
wassname
2026-05-30 03:52:24 +00:00
parent 4fb7b59548
commit 4621488cc0
19 changed files with 296 additions and 75 deletions
+81
View File
@@ -0,0 +1,81 @@
# Requeue manifest (queue reset 2026-05-30)
Durable copy of the pueue why/resolve labels + commands, captured before
`pueue reset` wiped them. This file is the requeue plan. Raw JSON snapshot:
`/tmp/claude-1000/queue_snapshot_033450.json` (not durable — this .md is).
## Dependency note (env reframe 2026-05-30)
The 4 dynamics-plot arms (none/static-erase/online-erase/route) are to be
rebuilt on the NEW expose-K split env, not the original run_tests env. That
env does not exist yet: it is gated on T0 (#183, M2>0 reachability) and then
needs T1/T2/T4 built (Stage 2 #132). So the plot arms below are commands for
the ORIGINAL env and must be re-pathed to the split env once it is built.
Original-env why-jobs (marathon #152, prog_widest #181, G2 screens, defer)
can requeue immediately and use new out/ paths after the reorg (#131).
### #184 (prio 90)
- why: T8 KEY GOAL seed-41 cell intervention=none (60-step head-to-head); resolve: route/erase ship-hack < none at matched solve => projection beats vanilla; feeds dynamics plot
```
just run-cell none 41
```
### #185 (prio 90)
- why: T8 KEY GOAL seed-41 cell intervention=erase (60-step head-to-head); resolve: route/erase ship-hack < none at matched solve => projection beats vanilla; feeds dynamics plot
```
just run-cell erase 41
```
### #186 (prio 90)
- why: T8 KEY GOAL seed-41 cell intervention=route (60-step head-to-head); resolve: route/erase ship-hack < none at matched solve => projection beats vanilla; feeds dynamics plot
```
just run-cell route 41
```
### #187 (prio 90)
- why: T8 overlay missing the ONLINE-erasure arm (refresh-2) at matched mix=0.125/s41/60-step — user wants it back in dynamics overlay; resolve: 4-arm overlay none/static-erase/online-erase/route, all seed-41, shows whether refresh keeps hack_s down longer than static
```
just run-cell erase 41 2
```
### #181 (prio 40)
- why: does v_hack from 'prog_widest' suppress mechanical LeetCode hack at matched solve, seed41 frozen; resolve: L5_hack vs vanilla #153 (0.664), prog_wide #156 (0.500)
```
uv run python -m projected_grpo.train fast --teacher-pool-dir=out/probe_distill/teacher_pool --grad-clip=500 --seed=41 --intervention=erase --v-hack-path=out/v_hack_pairset_prog_widest.safetensors --out-tag=_pairset_prog_widest_s41
```
### #183 (prio 8)
- why: T0 de-risk — does Qwen3-4B hardcode K visible tests under expose-K nudge; resolve: M2 rate >0 => expose-K env reachable, proceed T1; ~0 => STOP rethink env
```
uv run python scripts/derisk_expose_k.py --model=Qwen/Qwen3-4B --n-problems=24 --group=8 --k-visible=2 --seed=41
```
### #152 (prio 1)
- why: low mix + high refresh, LONG horizon — different equilibrium than full-hack? (v_hack_21pairs, refresh-2 k=12, mix0125, s42, 1000 steps); resolve: hack_s<~0.7 and/or solve_s>~0.2 at large step => new attractor [demoted to bg prio so short jobs run first]
```
just fast-projected --v-hack-path=out/v_hack_21pairs.safetensors --vhack-refresh-every=2 --seed=42 --steps=1000 --out-tag=_equilib_refresh2_k12_mix0125_1000_s42
```
### #137 (prio 0)
- why: G2 screen rl-rewardhacking-leetcode-gt-monitor-screening-s65 (5-prompt batch); resolve: per-checkpoint E/C/D% table to find non-rh-s65 detector clusters
```
just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-gt-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-gt-monitor-screening-s65 5
```
### #138 (prio 0)
- why: G2 screen rl-rewardhacking-leetcode-judge-monitor-screening-s65 (5-prompt batch); resolve: per-checkpoint E/C/D% table to find non-rh-s65 detector clusters
```
just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-judge-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-judge-monitor-screening-s65 5
```
### #139 (prio 0)
- why: G2 screen rl-rewardhacking-leetcode-probe-monitor-screening-s65 (5-prompt batch); resolve: per-checkpoint E/C/D% table to find non-rh-s65 detector clusters
```
just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-probe-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-probe-monitor-screening-s65 5
```
### #173 (prio -10)
- why: does Qwen3.6-27B defer under DEFER_PERSONA on blatant authority prompts (w2schar-mini); resolve: [DEFER] gens COMPLY not refuse => wire persona-gen into prepare_round; else need another deferring-anchor source
```
uv run python scripts/validate_defer_persona.py
```