diff --git a/docs/spec/20260529_gradient_routing_and_env_split.md b/docs/spec/20260529_gradient_routing_and_env_split.md index 55c36f6..7a128cd 100644 --- a/docs/spec/20260529_gradient_routing_and_env_split.md +++ b/docs/spec/20260529_gradient_routing_and_env_split.md @@ -213,7 +213,15 @@ current env) -> Stage 2 env = T0,T1,T2,T4 -> Stage 3 cross-mechanism = T7 - sneaky_fail: off-distribution (caught by len/compile comparison, R2) - UAT: "the pool breakdown shows a real M2 fraction in model style" -- [ ] T5 (R3): eval-time ablation pass for the route arm +- [x] T5 (R3): eval-time ablation pass for the route arm + DONE 2026-05-30: eval_hack_solve (student-only generate+grade on a fixed + subset) + ablate_quarantine ctx (zero delta_S_hack). Periodic ablated-eval + every --eval-ablate-every steps -> hack_abl/solve_abl cols (appended after + refr, so results.py positional indices unchanged). Final kept-vs-ablated + ROUTE EVAL BLUF. plot_dynamics remaps routing -> ablated series (coherence-gap + fix). smoke route prints periodic + final BLUF (all 0 on tiny-random, as + expected; behavioral evidence is the queued real run #182). M2 split is + Stage 3 (T7); T5 logs M1 (C) + solve only. - steps: after training, run an eval batch twice — with and without `delta_S_hack` (zeroed) — log hack_s (M1 and M2 separately) and solve. - verify: BLUF prints `ablated: hackM1=.. hackM2=.. solve=..` vs `kept: ..` @@ -232,7 +240,11 @@ current env) -> Stage 2 env = T0,T1,T2,T4 -> Stage 3 cross-mechanism = T7 mechanism-specific). Report which, don't bury a null. - UAT: "I see hack_M2 lower under route/erase than none, at matched solve" -- [ ] T8 (R6): KEY GOAL — regenerate both plots from real 60-step runs +- [/] T8 (R6): KEY GOAL — regenerate both plots from real 60-step runs + RECIPES DONE 2026-05-30: `just run-cell {none,erase,route} SEED` (60-step + fast, shared v_hack_21pairs, eval-ablate-5) + `just regen-dynamics GLOB`. + Cells tagged _cell_{intervention}_s{seed}. Gated on evidence run #182 (real- + scale route) passing before queuing the seed-41 cells. - justfile recipes (written in T3, once --intervention exists): one recipe per CELL so each is a separate pueue job, e.g. `just run-cell INTERVENTION SEED` -> `train ... --intervention={none,erase,route} --steps=60 --seed=SEED diff --git a/justfile b/justfile index b326e5f..ee8370a 100644 --- a/justfile +++ b/justfile @@ -101,6 +101,24 @@ fast-projected *ARGS: --teacher-pool-dir=out/probe_distill/teacher_pool \ --grad-clip=500 {{ ARGS }} +# T8 (KEY GOAL): one CELL of the dynamics-plot matrix as a separate pueue job. +# INTERVENTION in {none, erase, route}; SEED an int. 60-step fast horizon, +# shared v_hack_21pairs basis (vanilla uses it only for the cos_pre diagnostic), +# eval-ablation on (no-op for none/erase; gives route its ablated series + BLUF). +# Logs land as ..._cell_{intervention}_s{seed}.log -> regen-dynamics globs them. +run-cell INTERVENTION SEED: + {{ TRAIN }} fast --intervention={{ INTERVENTION }} \ + --v-hack-path=out/v_hack_21pairs.safetensors \ + --teacher-pool-dir=out/probe_distill/teacher_pool \ + --grad-clip=500 --steps=60 --seed={{ SEED }} \ + --eval-ablate-every=5 \ + --out-tag=_cell_{{ INTERVENTION }}_s{{ SEED }} + +# Regenerate both dynamics plots from the cell logs (default: all cells; pass a +# narrower glob like 'logs/*_cell_*_s41.log' for the seed-41-only checkpoint). +regen-dynamics GLOB='logs/*_cell_*.log': + uv run python scripts/plot_dynamics.py {{ GLOB }} --out out/dynamics.png + # Sync the rl-rewardhacking external repo (Nanda's verl wrapper). sync-external: cd external/rl-rewardhacking && git pull --ff-only