From 969c724d9d6dc2a3ce2fa3583d2b891e011fe1aa Mon Sep 17 00:00:00 2001 From: wassname Date: Sat, 30 May 2026 02:43:10 +0000 Subject: [PATCH] docs+chore: out/ reorg scheme (queue-gated) + archive dead _OLD_step_format dirs out/ is 25GB/195 loose files. Target: one subdir per datatype, per-run artifacts under runs/_/. NOT executed live: 11 queued jobs pass out/ paths as literal args, so the data move + code-path edits run atomically when the queue is idle. Archived the unreferenced *_OLD_step_format dirs now. Co-Authored-By: Claude Opus 4.8 --- docs/spec/20260530_out_dir_reorg.md | 65 +++++++++++++++++++++++++++++ 1 file changed, 65 insertions(+) create mode 100644 docs/spec/20260530_out_dir_reorg.md diff --git a/docs/spec/20260530_out_dir_reorg.md b/docs/spec/20260530_out_dir_reorg.md new file mode 100644 index 0000000..b9c986c --- /dev/null +++ b/docs/spec/20260530_out_dir_reorg.md @@ -0,0 +1,65 @@ +# out/ reorg — clean path scheme (by datatype, run-prefixed) + +## Goal +out/ is 25GB / 195 loose files: `train_*.safetensors` checkpoints, `v_hack_*`, +`vhack_grads_*`, and a dozen `probe_distill/teacher_pool*` dirs all at top level. +Sort by path: one subdir per datatype, per-run artifacts grouped under a +`_` run dir. Code reads+writes the new paths; old outputs moved. + +## Why this is NOT done live (the gate) +11 queued/running pueue jobs pass `out/` paths as literal args +(`--v-hack-path=out/v_hack_*.safetensors`, `--teacher-pool-dir=out/probe_distill/teacher_pool`, +`--pairs-from-pool=out/pairsets/*.json`). Moving those files mid-queue breaks +every job that hasn't started. So the data move + code-path edits run as ONE +atomic change when the queue is idle (`pueue status` all Done/Queued-empty). +Until then only the unreferenced `*_OLD_step_format` dirs are archived (done +2026-05-30 -> `out/_archive/`). + +## Target scheme +``` +out/ + vhack/ v_hack_*.safetensors # extracted bases (flat, named) + vhack_grads/ vhack_grads_*.safetensors # raw per-pair grads (extract intermediates) + pools/ / # teacher pools (was probe_distill/teacher_pool*) + pairsets/ *.json # unchanged + baked/ / # unchanged + runs/_/ train.safetensors, first_hack.safetensors # per-train-run + _archive/ dead / superseded +``` +- `runs/_/`: checkpoints currently are `out/train_.safetensors` + with no timestamp. Migration maps each to its log's `` via the matching + `logs/_*_.log`, groups into a run dir. New runs write here directly. +- `pools/`: drop the `probe_distill/` nesting (it was never about probes); + flatten `teacher_pool`, `base_pool`, `mixed_*`, the `teacher_pool_rl-*` and + `teacher_pool_inoc-*` variants into `pools//`. + +## Code edits (apply atomically with the data move) +- `train.py`: checkpoint save path -> `out/runs//{train,first_hack}.safetensors` + (`run_id` already built for the log name). `--teacher-pool-dir` default -> + `out/pools/teacher_pool`. v_hack load path is an explicit arg (no default). +- `extract_vhack_grad.py`: `--out-path` default -> `out/vhack/.safetensors`; + `--train-grads-path` default -> `out/vhack_grads/.safetensors`. +- `probe_distill.py`: pool write dir -> `out/pools/`. +- `justfile`: every recipe with `out/v_hack_*`, `out/probe_distill/teacher_pool*`, + `out/pairsets/*` -> new paths. (These are the literal strings the queue + captured, hence the idle-gate.) +- `scripts/results.py`: `vhack=grab(r"v-hack-path=out/(\S+?)\.safetensors")` -> + allow the `vhack/` prefix (strip dir for the display name). +- `scripts/plot_dynamics.py`: same v_hack path parse if it reads one. + +## Migration (data move, run when idle) +A `--dry-run`-default script that: (1) `mkdir` the new dirs; (2) `git mv`/`mv` +loose `v_hack_*`->`vhack/`, `vhack_grads_*`->`vhack_grads/`; (3) for each +`train_*.safetensors`, find its log `` and move to `runs/_/`; +(4) move `probe_distill/*pool*`->`pools/`. Idempotent; logs anything it can't +map (stays put, flagged) rather than silently dropping. + +## Verify +- `just smoke` + `just smoke-vanilla` green (walks write paths). +- `just results` still parses every run (vhack display names unchanged). +- `find out -maxdepth 1 -type f | wc -l` ~ 0 (no loose top-level files). +- A re-extract + a fast run write into `vhack/` and `runs/` respectively. + +## UAT +"out/ has one subdir per datatype; train artifacts live under runs/_/; +nothing loose at top level; smoke + results + a fresh run all still work."