mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:30:30 +08:00
docs+chore: out/ reorg scheme (queue-gated) + archive dead _OLD_step_format dirs
out/ is 25GB/195 loose files. Target: one subdir per datatype, per-run artifacts under runs/<ts>_<slug>/. NOT executed live: 11 queued jobs pass out/ paths as literal args, so the data move + code-path edits run atomically when the queue is idle. Archived the unreferenced *_OLD_step_format dirs now. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,65 @@
|
||||
# out/ reorg — clean path scheme (by datatype, run-prefixed)
|
||||
|
||||
## Goal
|
||||
out/ is 25GB / 195 loose files: `train_*.safetensors` checkpoints, `v_hack_*`,
|
||||
`vhack_grads_*`, and a dozen `probe_distill/teacher_pool*` dirs all at top level.
|
||||
Sort by path: one subdir per datatype, per-run artifacts grouped under a
|
||||
`<timestamp>_<slug>` run dir. Code reads+writes the new paths; old outputs moved.
|
||||
|
||||
## Why this is NOT done live (the gate)
|
||||
11 queued/running pueue jobs pass `out/` paths as literal args
|
||||
(`--v-hack-path=out/v_hack_*.safetensors`, `--teacher-pool-dir=out/probe_distill/teacher_pool`,
|
||||
`--pairs-from-pool=out/pairsets/*.json`). Moving those files mid-queue breaks
|
||||
every job that hasn't started. So the data move + code-path edits run as ONE
|
||||
atomic change when the queue is idle (`pueue status` all Done/Queued-empty).
|
||||
Until then only the unreferenced `*_OLD_step_format` dirs are archived (done
|
||||
2026-05-30 -> `out/_archive/`).
|
||||
|
||||
## Target scheme
|
||||
```
|
||||
out/
|
||||
vhack/ v_hack_*.safetensors # extracted bases (flat, named)
|
||||
vhack_grads/ vhack_grads_*.safetensors # raw per-pair grads (extract intermediates)
|
||||
pools/ <pool_name>/ # teacher pools (was probe_distill/teacher_pool*)
|
||||
pairsets/ *.json # unchanged
|
||||
baked/ <variant>/ # unchanged
|
||||
runs/<ts>_<slug>/ train.safetensors, first_hack.safetensors # per-train-run
|
||||
_archive/ dead / superseded
|
||||
```
|
||||
- `runs/<ts>_<slug>/`: checkpoints currently are `out/train_<tag>.safetensors`
|
||||
with no timestamp. Migration maps each to its log's `<ts>` via the matching
|
||||
`logs/<ts>_*_<tag>.log`, groups into a run dir. New runs write here directly.
|
||||
- `pools/`: drop the `probe_distill/` nesting (it was never about probes);
|
||||
flatten `teacher_pool`, `base_pool`, `mixed_*`, the `teacher_pool_rl-*` and
|
||||
`teacher_pool_inoc-*` variants into `pools/<name>/`.
|
||||
|
||||
## Code edits (apply atomically with the data move)
|
||||
- `train.py`: checkpoint save path -> `out/runs/<run_id>/{train,first_hack}.safetensors`
|
||||
(`run_id` already built for the log name). `--teacher-pool-dir` default ->
|
||||
`out/pools/teacher_pool`. v_hack load path is an explicit arg (no default).
|
||||
- `extract_vhack_grad.py`: `--out-path` default -> `out/vhack/<name>.safetensors`;
|
||||
`--train-grads-path` default -> `out/vhack_grads/<name>.safetensors`.
|
||||
- `probe_distill.py`: pool write dir -> `out/pools/<name>`.
|
||||
- `justfile`: every recipe with `out/v_hack_*`, `out/probe_distill/teacher_pool*`,
|
||||
`out/pairsets/*` -> new paths. (These are the literal strings the queue
|
||||
captured, hence the idle-gate.)
|
||||
- `scripts/results.py`: `vhack=grab(r"v-hack-path=out/(\S+?)\.safetensors")` ->
|
||||
allow the `vhack/` prefix (strip dir for the display name).
|
||||
- `scripts/plot_dynamics.py`: same v_hack path parse if it reads one.
|
||||
|
||||
## Migration (data move, run when idle)
|
||||
A `--dry-run`-default script that: (1) `mkdir` the new dirs; (2) `git mv`/`mv`
|
||||
loose `v_hack_*`->`vhack/`, `vhack_grads_*`->`vhack_grads/`; (3) for each
|
||||
`train_*.safetensors`, find its log `<ts>` and move to `runs/<ts>_<slug>/`;
|
||||
(4) move `probe_distill/*pool*`->`pools/`. Idempotent; logs anything it can't
|
||||
map (stays put, flagged) rather than silently dropping.
|
||||
|
||||
## Verify
|
||||
- `just smoke` + `just smoke-vanilla` green (walks write paths).
|
||||
- `just results` still parses every run (vhack display names unchanged).
|
||||
- `find out -maxdepth 1 -type f | wc -l` ~ 0 (no loose top-level files).
|
||||
- A re-extract + a fast run write into `vhack/` and `runs/` respectively.
|
||||
|
||||
## UAT
|
||||
"out/ has one subdir per datatype; train artifacts live under runs/<ts>_<slug>/;
|
||||
nothing loose at top level; smoke + results + a fresh run all still work."
|
||||
Reference in New Issue
Block a user