fix: seeded-shuffle train pool (was first-200-by-id = easy/memorized); add queue-dir6/queue-broad recipes

Train side of the same contamination bug: fast preset loaded first-200-by-id =
the lowest/oldest/most pretraining-memorized problems (base solves them easily ->
weak hack incentive). Now a seeded-random representative sample (seed=cfg.seed),
with the teacher-seed ids pinned in so seeding still fires. Paper trains on all
992 (base ~20%); job 176 confirmed base test=0.094 / train_filtered=0.203,
matching paper fn9.

Adds justfile recipes:
- queue-dir6 SEED: 8-arm single-seed directionality set (routeV real rollout/
  per-token, random-V both, vanilla, vampire in-subspace placebo, +2 LoRA-frozen-B
  routeV) on teacher_pool_runtests + fixed eval.
- queue-broad: headline arms (vanilla/erase/routeV) x 3 seeds for paired-t
  significance + directionality/adapter ablations at one seed.

Spec: docs/spec/20260607_eval_contamination_fix.md (force-added; docs/ gitignored).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-07 08:27:39 +00:00
parent ea01267cd8
commit cc8db051ab
3 changed files with 122 additions and 8 deletions
+38
View File
@@ -125,6 +125,44 @@ fast-projected *ARGS:
fast-lora-routeV *ARGS:
{{ TRAIN }} fast --intervention=routeV --adapter=lora_frozen_b --lora-r=32 {{ ARGS }}
# H: vGROUT directionality set -- 6 arms, ONE seed, single-mode run_tests, on the
# FIXED eval (paper test set, base solve ~0.1). Tests whether routeV's deploy-hack
# suppression needs the REAL hack direction. resolve: real-V (rollout & per-token)
# << {random-V (Haar, out-of-subspace), vampire (in-subspace semantic placebo)}
# in deploy hack at matched solve, and vanilla deploy hack >> 0 (else nothing to
# suppress). Same teacher_pool_runtests (6 prompts) + grad-clip=500 as the diag runs.
# Priority descending so they execute in listed order (routeV best first).
queue-dir6 seed='43':
pueue add -w "$PWD" -o 60 -l "why: P1 routeV real-V per-rollout (best method) s{{seed}}; resolve: deploy_hack << random/vampire at matched solve" -- {{ TRAIN }} fast --intervention=routeV --teacher-pool-dir=out/pools/teacher_pool_runtests --grad-clip=500 --seed={{seed}} --out-tag=_dir6_routeV_s{{seed}}
pueue add -w "$PWD" -o 55 -l "why: P2 routeV real-V PER-TOKEN s{{seed}}; resolve: finer routing >= per-rollout suppression, no solve cost" -- {{ TRAIN }} fast --intervention=routeV --routeV-per-token --teacher-pool-dir=out/pools/teacher_pool_runtests --grad-clip=500 --seed={{seed}} --out-tag=_dir6_routeV_pertoken_s{{seed}}
pueue add -w "$PWD" -o 50 -l "why: P3 routeV RANDOM-V per-rollout (Haar control) s{{seed}}; resolve: deploy_hack ~ vanilla -> real-V suppression is directional, not absorption" -- {{ TRAIN }} fast --intervention=routeV --routeV-random-v-seed=157 --teacher-pool-dir=out/pools/teacher_pool_runtests --grad-clip=500 --seed={{seed}} --out-tag=_dir6_routeV_random_s{{seed}}
pueue add -w "$PWD" -o 45 -l "why: P4 routeV RANDOM-V PER-TOKEN s{{seed}}; resolve: per-token random also fails to suppress -> granularity isn't the lever, direction is" -- {{ TRAIN }} fast --intervention=routeV --routeV-per-token --routeV-random-v-seed=157 --teacher-pool-dir=out/pools/teacher_pool_runtests --grad-clip=500 --seed={{seed}} --out-tag=_dir6_routeV_pertoken_random_s{{seed}}
pueue add -w "$PWD" -o 40 -l "why: P5 VANILLA reference s{{seed}}; resolve: deploy_hack >> 0 by step 60 (emergence) -> the suppression target exists" -- {{ TRAIN }} fast --intervention=none --teacher-pool-dir=out/pools/teacher_pool_runtests --grad-clip=500 --seed={{seed}} --out-tag=_dir6_vanilla_s{{seed}}
pueue add -w "$PWD" -o 35 -l "why: P6 routeV VAMPIRE (in-subspace semantic placebo, null_vampire pairs) s{{seed}}; resolve: deploy_hack ~ vanilla -> v_grad must point at the HACK, not just any in-subspace semantic axis" -- {{ TRAIN }} fast --intervention=routeV --vhack-pairs-path=out/pairsets/null_vampire.json --teacher-pool-dir=out/pools/teacher_pool_runtests --grad-clip=500 --seed={{seed}} --out-tag=_dir6_routeV_vampire_s{{seed}}
pueue add -w "$PWD" -o 30 -l "why: P7 LoRA-frozen-B routeV real-V per-rollout s{{seed}}; resolve: deploy_hack ~ AntiPaSTO routeV -> routing is adapter-agnostic (lives in the r-bottleneck, not the SVD basis)" -- {{ TRAIN }} fast --intervention=routeV --adapter=lora_frozen_b --lora-r=32 --teacher-pool-dir=out/pools/teacher_pool_runtests --grad-clip=500 --seed={{seed}} --out-tag=_dir6_lora_routeV_s{{seed}}
pueue add -w "$PWD" -o 28 -l "why: P8 LoRA-frozen-B routeV real-V PER-TOKEN s{{seed}}; resolve: per-token on the static-B path matches AntiPaSTO per-token suppression" -- {{ TRAIN }} fast --intervention=routeV --routeV-per-token --adapter=lora_frozen_b --lora-r=32 --teacher-pool-dir=out/pools/teacher_pool_runtests --grad-clip=500 --seed={{seed}} --out-tag=_dir6_lora_routeV_pertoken_s{{seed}}
# H: BROADER sweep for the paper -- headline arms (vanilla, erase, routeV real-V) across
# 3 SEEDS for the paired-t significance the paper insists on, plus the directionality +
# adapter ablations at one seed. Single-mode run_tests, FIXED eval (paper test set).
# Lower priority than queue-dir6 (the focused single-seed set runs first). resolve:
# routeV deploy_hack < vanilla at matched solve, significant across the 3 seeds (paired t,
# alpha=0.01 like the paper); ablations (random/vampire) ~ vanilla confirm directionality.
TEACHER_RT := "out/pools/teacher_pool_runtests"
queue-broad:
#!/usr/bin/env bash
set -eu
for seed in {{ SEEDS_3 }}; do
pueue add -w "$PWD" -o 22 -l "why: headline VANILLA s$seed (3-seed significance); resolve: deploy_hack emergence reference" -- {{ TRAIN }} fast --intervention=none --teacher-pool-dir={{ TEACHER_RT }} --grad-clip=500 --seed=$seed --out-tag=_broad_vanilla_s$seed
pueue add -w "$PWD" -o 21 -l "why: headline routeV real-V s$seed (3-seed significance); resolve: deploy_hack < vanilla at matched solve, paired across seeds" -- {{ TRAIN }} fast --intervention=routeV --teacher-pool-dir={{ TEACHER_RT }} --grad-clip=500 --seed=$seed --out-tag=_broad_routeV_s$seed
pueue add -w "$PWD" -o 20 -l "why: headline ERASE (one-sided projection) s$seed (3-seed significance); resolve: deploy_hack < vanilla at matched solve" -- {{ TRAIN }} fast --intervention=erase --teacher-pool-dir={{ TEACHER_RT }} --grad-clip=500 --seed=$seed --out-tag=_broad_erase_s$seed
done
# ablations (one seed = 43): directionality controls + per-token + lora
pueue add -w "$PWD" -o 15 -l "why: ablation routeV RANDOM-V (Haar) s43; resolve: ~vanilla -> suppression is directional" -- {{ TRAIN }} fast --intervention=routeV --routeV-random-v-seed=157 --teacher-pool-dir={{ TEACHER_RT }} --grad-clip=500 --seed=43 --out-tag=_broad_routeV_random_s43
pueue add -w "$PWD" -o 15 -l "why: ablation routeV VAMPIRE (in-subspace placebo) s43; resolve: ~vanilla -> v_grad must point at the hack" -- {{ TRAIN }} fast --intervention=routeV --vhack-pairs-path=out/pairsets/null_vampire.json --teacher-pool-dir={{ TEACHER_RT }} --grad-clip=500 --seed=43 --out-tag=_broad_routeV_vampire_s43
pueue add -w "$PWD" -o 15 -l "why: ablation routeV PER-TOKEN s43; resolve: granularity effect on suppression" -- {{ TRAIN }} fast --intervention=routeV --routeV-per-token --teacher-pool-dir={{ TEACHER_RT }} --grad-clip=500 --seed=43 --out-tag=_broad_routeV_pertoken_s43
pueue add -w "$PWD" -o 15 -l "why: ablation LoRA-frozen-B routeV s43; resolve: routing is adapter-agnostic" -- {{ TRAIN }} fast --intervention=routeV --adapter=lora_frozen_b --lora-r=32 --teacher-pool-dir={{ TEACHER_RT }} --grad-clip=500 --seed=43 --out-tag=_broad_lora_routeV_s43
# T8 (KEY GOAL): one CELL of the dynamics-plot matrix as a separate pueue job.
# INTERVENTION in {none, erase, route}; SEED an int. 60-step fast horizon,
# shared v_hack_21pairs basis (vanilla uses it only for the cos_pre diagnostic),