out/ is 25GB/195 loose files. Target: one subdir per datatype, per-run artifacts under runs/<ts>_<slug>/. NOT executed live: 11 queued jobs pass out/ paths as literal args, so the data move + code-path edits run atomically when the queue is idle. Archived the unreferenced *_OLD_step_format dirs now. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
3.6 KiB
out/ reorg — clean path scheme (by datatype, run-prefixed)
Goal
out/ is 25GB / 195 loose files: train_*.safetensors checkpoints, v_hack_*,
vhack_grads_*, and a dozen probe_distill/teacher_pool* dirs all at top level.
Sort by path: one subdir per datatype, per-run artifacts grouped under a
<timestamp>_<slug> run dir. Code reads+writes the new paths; old outputs moved.
Why this is NOT done live (the gate)
11 queued/running pueue jobs pass out/ paths as literal args
(--v-hack-path=out/v_hack_*.safetensors, --teacher-pool-dir=out/probe_distill/teacher_pool,
--pairs-from-pool=out/pairsets/*.json). Moving those files mid-queue breaks
every job that hasn't started. So the data move + code-path edits run as ONE
atomic change when the queue is idle (pueue status all Done/Queued-empty).
Until then only the unreferenced *_OLD_step_format dirs are archived (done
2026-05-30 -> out/_archive/).
Target scheme
out/
vhack/ v_hack_*.safetensors # extracted bases (flat, named)
vhack_grads/ vhack_grads_*.safetensors # raw per-pair grads (extract intermediates)
pools/ <pool_name>/ # teacher pools (was probe_distill/teacher_pool*)
pairsets/ *.json # unchanged
baked/ <variant>/ # unchanged
runs/<ts>_<slug>/ train.safetensors, first_hack.safetensors # per-train-run
_archive/ dead / superseded
runs/<ts>_<slug>/: checkpoints currently areout/train_<tag>.safetensorswith no timestamp. Migration maps each to its log's<ts>via the matchinglogs/<ts>_*_<tag>.log, groups into a run dir. New runs write here directly.pools/: drop theprobe_distill/nesting (it was never about probes); flattenteacher_pool,base_pool,mixed_*, theteacher_pool_rl-*andteacher_pool_inoc-*variants intopools/<name>/.
Code edits (apply atomically with the data move)
train.py: checkpoint save path ->out/runs/<run_id>/{train,first_hack}.safetensors(run_idalready built for the log name).--teacher-pool-dirdefault ->out/pools/teacher_pool. v_hack load path is an explicit arg (no default).extract_vhack_grad.py:--out-pathdefault ->out/vhack/<name>.safetensors;--train-grads-pathdefault ->out/vhack_grads/<name>.safetensors.probe_distill.py: pool write dir ->out/pools/<name>.justfile: every recipe without/v_hack_*,out/probe_distill/teacher_pool*,out/pairsets/*-> new paths. (These are the literal strings the queue captured, hence the idle-gate.)scripts/results.py:vhack=grab(r"v-hack-path=out/(\S+?)\.safetensors")-> allow thevhack/prefix (strip dir for the display name).scripts/plot_dynamics.py: same v_hack path parse if it reads one.
Migration (data move, run when idle)
A --dry-run-default script that: (1) mkdir the new dirs; (2) git mv/mv
loose v_hack_*->vhack/, vhack_grads_*->vhack_grads/; (3) for each
train_*.safetensors, find its log <ts> and move to runs/<ts>_<slug>/;
(4) move probe_distill/*pool*->pools/. Idempotent; logs anything it can't
map (stays put, flagged) rather than silently dropping.
Verify
just smoke+just smoke-vanillagreen (walks write paths).just resultsstill parses every run (vhack display names unchanged).find out -maxdepth 1 -type f | wc -l~ 0 (no loose top-level files).- A re-extract + a fast run write into
vhack/andruns/respectively.
UAT
"out/ has one subdir per datatype; train artifacts live under runs/_/; nothing loose at top level; smoke + results + a fresh run all still work."