docs+chore: out/ reorg scheme (queue-gated) + archive dead _OLD_step_format dirs

out/ is 25GB/195 loose files. Target: one subdir per datatype, per-run
artifacts under runs/<ts>_<slug>/. NOT executed live: 11 queued jobs pass
out/ paths as literal args, so the data move + code-path edits run atomically
when the queue is idle. Archived the unreferenced *_OLD_step_format dirs now.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
wassname
2026-05-30 02:43:10 +00:00
parent 2b02e7aa77
commit 969c724d9d
+65
View File
@@ -0,0 +1,65 @@
# out/ reorg — clean path scheme (by datatype, run-prefixed)
## Goal
out/ is 25GB / 195 loose files: `train_*.safetensors` checkpoints, `v_hack_*`,
`vhack_grads_*`, and a dozen `probe_distill/teacher_pool*` dirs all at top level.
Sort by path: one subdir per datatype, per-run artifacts grouped under a
`<timestamp>_<slug>` run dir. Code reads+writes the new paths; old outputs moved.
## Why this is NOT done live (the gate)
11 queued/running pueue jobs pass `out/` paths as literal args
(`--v-hack-path=out/v_hack_*.safetensors`, `--teacher-pool-dir=out/probe_distill/teacher_pool`,
`--pairs-from-pool=out/pairsets/*.json`). Moving those files mid-queue breaks
every job that hasn't started. So the data move + code-path edits run as ONE
atomic change when the queue is idle (`pueue status` all Done/Queued-empty).
Until then only the unreferenced `*_OLD_step_format` dirs are archived (done
2026-05-30 -> `out/_archive/`).
## Target scheme
```
out/
vhack/ v_hack_*.safetensors # extracted bases (flat, named)
vhack_grads/ vhack_grads_*.safetensors # raw per-pair grads (extract intermediates)
pools/ <pool_name>/ # teacher pools (was probe_distill/teacher_pool*)
pairsets/ *.json # unchanged
baked/ <variant>/ # unchanged
runs/<ts>_<slug>/ train.safetensors, first_hack.safetensors # per-train-run
_archive/ dead / superseded
```
- `runs/<ts>_<slug>/`: checkpoints currently are `out/train_<tag>.safetensors`
with no timestamp. Migration maps each to its log's `<ts>` via the matching
`logs/<ts>_*_<tag>.log`, groups into a run dir. New runs write here directly.
- `pools/`: drop the `probe_distill/` nesting (it was never about probes);
flatten `teacher_pool`, `base_pool`, `mixed_*`, the `teacher_pool_rl-*` and
`teacher_pool_inoc-*` variants into `pools/<name>/`.
## Code edits (apply atomically with the data move)
- `train.py`: checkpoint save path -> `out/runs/<run_id>/{train,first_hack}.safetensors`
(`run_id` already built for the log name). `--teacher-pool-dir` default ->
`out/pools/teacher_pool`. v_hack load path is an explicit arg (no default).
- `extract_vhack_grad.py`: `--out-path` default -> `out/vhack/<name>.safetensors`;
`--train-grads-path` default -> `out/vhack_grads/<name>.safetensors`.
- `probe_distill.py`: pool write dir -> `out/pools/<name>`.
- `justfile`: every recipe with `out/v_hack_*`, `out/probe_distill/teacher_pool*`,
`out/pairsets/*` -> new paths. (These are the literal strings the queue
captured, hence the idle-gate.)
- `scripts/results.py`: `vhack=grab(r"v-hack-path=out/(\S+?)\.safetensors")` ->
allow the `vhack/` prefix (strip dir for the display name).
- `scripts/plot_dynamics.py`: same v_hack path parse if it reads one.
## Migration (data move, run when idle)
A `--dry-run`-default script that: (1) `mkdir` the new dirs; (2) `git mv`/`mv`
loose `v_hack_*`->`vhack/`, `vhack_grads_*`->`vhack_grads/`; (3) for each
`train_*.safetensors`, find its log `<ts>` and move to `runs/<ts>_<slug>/`;
(4) move `probe_distill/*pool*`->`pools/`. Idempotent; logs anything it can't
map (stays put, flagged) rather than silently dropping.
## Verify
- `just smoke` + `just smoke-vanilla` green (walks write paths).
- `just results` still parses every run (vhack display names unchanged).
- `find out -maxdepth 1 -type f | wc -l` ~ 0 (no loose top-level files).
- A re-extract + a fast run write into `vhack/` and `runs/` respectively.
## UAT
"out/ has one subdir per datatype; train artifacts live under runs/<ts>_<slug>/;
nothing loose at top level; smoke + results + a fresh run all still work."