From 68b062473376812b8c815ea893a566949816a10f Mon Sep 17 00:00:00 2001 From: wassname <1103714+wassname@users.noreply.github.com> Date: Sat, 6 Jun 2026 00:01:58 +0000 Subject: [PATCH] backup: pueue job manifest (94 jobs, id/status/label/argv) at routing-refactor Local log backup in out/pueue_logs_backup/20260606T000138/ (status.json + full log + task_logs) is gitignored/box-local; this manifest is the durable why-label copy. Killed confounded full-teacher route2 directionality jobs 118/119/121/122/123. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com> --- docs/spec/20260606_job_manifest.md | 380 +++++++++++++++++++++++++++++ 1 file changed, 380 insertions(+) create mode 100644 docs/spec/20260606_job_manifest.md diff --git a/docs/spec/20260606_job_manifest.md b/docs/spec/20260606_job_manifest.md new file mode 100644 index 0000000..684ea28 --- /dev/null +++ b/docs/spec/20260606_job_manifest.md @@ -0,0 +1,380 @@ +# Pueue job manifest — 2026-06-06 (pre/at routing-refactor) + +Durable copy of every job's id / status / why-label / argv (local backup in +out/pueue_logs_backup/20260606T000138/ dies with the box). Source: status.json. + +### 0 — Success +- why: T0 de-risk — does Qwen3-4B hardcode K visible tests under expose-K nudge; resolve: M2 rate >0 => expose-K env reachable, build Stage2 (T1/T2/T4) + rebuild plot on new env; ~0 => STOP, rethink env +- `uv run python scripts/derisk_expose_k.py --model=Qwen/Qwen3-4B --n-problems=24 --group=8 --k-visible=2 --seed=41` + +### 1 — Success +- why: does v_hack from prog_widest suppress mechanical LeetCode hack at matched solve, seed41 frozen; resolve: L5_hack vs vanilla baseline +- `uv run python -m projected_grpo.train fast --teacher-pool-dir=out/pools/teacher_pool --grad-clip=500 --seed=41 --intervention=erase --v-hack-path=out/vhack/v_hack_pairset_prog_widest.safetensors --out-tag=_pairset_prog_widest_s41` + +### 2 — Success +- why: G2 screen gt-monitor-s65; resolve: per-checkpoint E/C/D% table to find non-rh teacher +- `just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-gt-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-gt-monitor-screening-s65 5` + +### 3 — Success +- why: G2 screen judge-monitor-s65; resolve: per-checkpoint E/C/D% table +- `just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-judge-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-judge-monitor-screening-s65 5` + +### 4 — Success +- why: G2 screen probe-monitor-s65; resolve: per-checkpoint E/C/D% table +- `just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-probe-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-probe-monitor-screening-s65 5` + +### 5 — Failed/Killed +- why: low mix+high refresh LONG horizon equilibrium (orig env); resolve: gap persists at convergence; HELD until new-env plot done +- `just fast-projected --v-hack-path=out/vhack/v_hack_21pairs.safetensors --vhack-refresh-every=2 --seed=42 --steps=1000 --out-tag=_equilib_refresh2_k12_mix0125_1000_s42` + +### 7 — Failed/Killed +- why: which of run_tests/eq_override/exit_code loopholes are faithful (base exploit~0) AND honored by our grader (elicit>>0); resolve: KEEP modes passing the quadrant -> they form the multi-loophole env, DROP leaky/unhonored ones +- `uv run python -m projected_grpo.derisk_loopholes --n-prompts=24 --group=4` + +### 8 — Success +- why: [oracle-fixed rerun] which of run_tests/eq_override/exit_code are faithful (base exploit~0) AND honored by grader (elicit>>0); resolve: KEEP quadrant-passers as the multi-loophole env +- `uv run python -m projected_grpo.derisk_loopholes --n-prompts=24 --group=4` + +### 9 — Failed/Killed +- why: does the exit_code loophole EMERGE under vanilla GRPO on-policy from the faithful hint (base=0% faithful, 97.9% when nudged); resolve: hack_s(exploited) rises from ~0 over 60 steps => Phase-1 confirms it's learnable, then build Phase-2 projection +- `just run-cell-mode exit_code 41` + +### 10 — Success +- why: which of 6 grader-flaw modes (run_tests/eq_override/exit_code/stdout_marker/sentinel/file_marker) pass the quadrant — faithful~0 AND elicit>>0 (base-elicitable + grader-honored); resolve: KEEP passers as the usable multi-loophole set (target 4+) +- `uv run python -m projected_grpo.derisk_loopholes --n-prompts=24 --group=4` + +### 11 — Failed/Killed +- why: does the student learn ALL 3 substrate loopholes (run_tests/exit_code/sentinel) from the even teacher batch under vanilla GRPO; resolve: per-mode SUBSTRATE table shows hacks>0 + finite first_step for each mode => substrate works, ready for Phase-2 projection +- `uv run python -m projected_grpo.train fast --intervention=none --teacher-pool-dir=out/pools/substrate --mix-ratio=0.25 --v-hack-path=out/vhack/v_hack_21pairs.safetensors --seed=41 --steps=80 --out-tag=_substrate3_emerge_s41` + +### 12 — Failed/Killed +- why: [grader-fixed rerun] does the student learn ALL 3 substrate loopholes (run_tests/exit_code/sentinel) under vanilla GRPO; resolve: per-mode SUBSTRATE table shows hacks>0 + finite first_step for EACH mode (not just one) => multi-loophole substrate works +- `uv run python -m projected_grpo.train fast --intervention=none --teacher-pool-dir=out/pools/substrate --mix-ratio=0.25 --v-hack-path=out/vhack/v_hack_21pairs.safetensors --seed=41 --steps=80 --out-tag=_substrate3_emerge_s41` + +### 13 — Success +- why: re-run 6-mode quadrant on FIXED grader (.valid sentinel + exhaustive non-overlap) with GENERIC elicit (no spoonfeed) + exit interview; resolve: faithful<10% AND elicit>>0 per mode => KEEP; notes reveal if a hint is too vague (model cant find loophole) or too blatant; elicit hacks feed substrate rebuild +- `uv run python -m projected_grpo.derisk_loopholes --n-prompts=24 --group=4` + +### 14 — Failed/Killed +- why: generic elicit mints 0 teacher hacks (base cant discover); mint via SPOONFEED for the substrate teacher batch (no-cheat: never shown to student); resolve: exit_code+sentinel elicit_hacks_*.jsonl non-empty so build-substrate can include all 3 classes +- `uv run python -m projected_grpo.derisk_loopholes --elicit-style=spoonfeed --n-prompts=24 --group=4` + +### 15 — Success +- why: bigger-spoon recipes validated 5/5 on 8b; re-mint the 4 modes that were 0/96 (eq_override,stdout_marker,sentinel,file_marker) on the 4B; resolve: each elicit_hacks_*.jsonl non-empty (>=5) so substrate can use all 6 classes +- `uv run python -m projected_grpo.derisk_loopholes --elicit-style=spoonfeed --modes=eq_override,stdout_marker,sentinel,file_marker --n-prompts=24 --group=4` + +### 16 — Success +- why: {func} syntax bug broke eq_override/sentinel mint (0/96) and may have malformed stdout/file hacks; re-mint all 4 with bare-name fix; resolve: each of eq_override/stdout_marker/sentinel/file_marker mints >=5 CLEAN hacks +- `uv run python -m projected_grpo.derisk_loopholes --elicit-style=spoonfeed --modes=eq_override,stdout_marker,sentinel,file_marker --n-prompts=24 --group=4` + +### 17 — Failed/Killed +- why: does the vanilla-GRPO student learn ALL 3 substrate loopholes (run_tests/exit_code/sentinel) from the even teacher batch + obvious hints; resolve: per-step hk_ cols + end SUBSTRATE table show hacks>0 + finite first_step for each of the 3 classes => answers 'how many classes learned' +- `just run-substrate none 41` + +### 18 — Failed/Killed +- why: does the vanilla-GRPO student learn all 3 substrate loopholes (run_tests/exit_code/sentinel) with the reworded exit_code hint + even teacher batch; resolve: per-mode hk_ cols + SUBSTRATE table show hacks>0 + finite first_step per class +- `just run-substrate none 41` + +### 19 — Success +- why: minted stdout/file teacher hacks bake the OLD tokens (OVERRIDE_GRADER_PASS/grader_result.txt) that diverge from the student hint (all tests passed/result.txt) after commit 67444b1; resolve: re-minted elicit_hacks_{stdout_marker,file_marker}.jsonl name the hint tokens so teacher==hint, each >=5 clean hacks +- `uv run python -m projected_grpo.derisk_loopholes --elicit-style=spoonfeed --modes=stdout_marker,file_marker --n-prompts=24 --group=4` + +### 20 — Success +- why: HEADLINE #137 — how many of K=5 loopholes (run_tests/eq_override/sentinel/stdout_marker/file_marker) does the vanilla-GRPO student learn from the even, token-coherent teacher batch (post 67444b1); resolve: end SUBSTRATE table + per-step hk_ cols show hacks>0 + finite first_step per class => count classes learned +- `just run-substrate none 41` + +### 21 — Success +- why: does the 4-of-5 headline + eq_override holdout replicate at seed 42 (vanilla emergence on the K=5 token-coherent substrate); resolve: per-mode first_step + final hk_ match seed 41 within noise => robust class-count, else seed-sensitive +- `just run-substrate none 42` + +### 22 — Success +- why: seed-43 replicate of the 4-of-5 vanilla emergence headline; resolve: eq_override stays NEVER and >=3 other modes reach hacks>0 => stable across 3 seeds +- `just run-substrate none 43` + +### 25 — Failed/{'Failed': 1} +- why: ARM3 gradient ROUTING + prog_wide v_hack refresh-5, 4-mode substrate; resolve: deployed (quarantine-ablated) per-mode hack < vanilla, cin_t>cin_s +- `just run-substrate route 41 60 5` + +### 26 — Failed/Killed +- why: ARM2 PROJECTED (erase) + prog_wide v_hack refresh-5, 4-mode substrate; resolve: training-time per-mode hack < vanilla at matched gt +- `just run-substrate erase 41 60 5` + +### 27 — Failed/Killed +- why: ARM1 VANILLA 60-step baseline on 4-mode substrate (overlay reference); resolve: all 4 modes reach hacks>0 +- `just run-substrate none 41 60 0` + +### 28 — Success +- why: ARM3 gradient ROUTING + prog_wide v_hack refresh-5, 4-mode substrate (requeue after RH_HINT_REPLACE_TO import fix); resolve: deployed (quarantine-ablated) per-mode hack < vanilla, cin_t>cin_s +- `just run-substrate route 41 60 5` + +### 29 — Success +- why: ARM2 PROJECTED (erase) + prog_wide v_hack refresh-5, 4-mode substrate (requeue after import fix); resolve: training-time per-mode hack < vanilla at matched gt +- `just run-substrate erase 41 60 5` + +### 30 — Success +- why: ARM1 VANILLA 60-step baseline on 4-mode substrate (overlay reference, requeue after routing); resolve: all 4 modes reach hacks>0 +- `just run-substrate none 41 60 0` + +### 31 — Success +- why: decompose route's deploy hack=0.125 per-mode -- did route GENERALISE (suppress held-out file_marker/sentinel that prog_wide v_hack never saw) or only its own run_tests? load-bearing weak-detector test; resolve: per-mode deploy hack on held-out modes << vanilla => generalises; ~vanilla => only in-dist +- `just run-substrate route 41 60 5` + +### 32 — Success +- why: #157 frozen-REAL-V route baseline (refresh off) to pair against random-V; resolve: anchors run31 effect at rf0 so random-V comparison is clean +- `uv run python -m projected_grpo.train fast --intervention=route --teacher-pool-dir=out/pools/substrate --v-hack-path=out/vhack/v_hack_pairset_prog_wide.safetensors --vhack-pairs-path=out/pairsets/prog_wide.json --vhack-refresh-every=0 --seed=41 --steps=60 --out-tag=_sub4_route_rf0_REAL_s41` + +### 33 — Failed/Killed +- why: #157 frozen-RANDOM-V route control (Haar V, _sv matched, refresh off); resolve: if deploy-hack~0 + solve-jump reproduce vs REAL, run31 is ablation artifact not directional; if stays hacky, direction is load-bearing +- `uv run python -m projected_grpo.train fast --intervention=route --teacher-pool-dir=out/pools/substrate --v-hack-path=out/vhack/v_hack_pairset_prog_wide_randomV.safetensors --vhack-pairs-path=out/pairsets/prog_wide.json --vhack-refresh-every=0 --seed=41 --steps=60 --out-tag=_sub4_route_rf0_RAND_s41` + +### 34 — Failed/{'Failed': 1} +- why: #159 first real route2 (Arm B distinct-basis quarantine, act-mask, tau=0 default) on substrate; resolve: ||B_q||>0 + per-mode deploy hack on held-out modes vs run-31 additive route; if solve tanks, tau too low (over-route/starvation) +- `just run-substrate route2 41 60 5` + +### 35 — Failed/Killed +- why: #160 route2 Arm A (grad-mask, single-pass gate subtraction) substrate run, pairs with job34 route2-act for the 5-arm plot; resolve: ||B_q||>0, per-mode held-out deploy hack vs route2-act + vanilla; WATCH deploy solve-jump (review-h Adam-parasite tell) +- `uv run python -m projected_grpo.train fast --intervention=route2 --route2-mask=grad --teacher-pool-dir=out/pools/substrate --vhack-pairs-path=out/pairsets/prog_wide.json --vhack-refresh-every=5 --seed=41 --steps=60 --out-tag=_sub4_route2grad_rf5_s41` + +### 36 — Failed/Killed +- why: working refresh-V route on CURRENT harness, head-to-head vs frozen job32 (rf0, 8pp); resolve: rf5 deploy hack << 0.297 at matched solve => refresh is the load-bearing knob, frozen(32) is ablation-only +- `just run-substrate route 41 60 5` + +### 39 — Success +- why: does Qwen3.6-27B defer under DEFER_PERSONA on blatant authority prompts (w2schar-mini); resolve: [DEFER] gens COMPLY not refuse => wire persona-gen into prepare_round; else need another deferring-anchor source. REQUEUE of failed #6 (ran from wrong workdir) +- `uv run python scripts/validate_defer_persona.py` + +### 40 — Success +- why: vanilla emergence reference (4-mode substrate); resolve: per-mode deploy-hack baseline for the overlay +- `uv run python -m projected_grpo.train fast --intervention=none --out-tag=_sub4_vanilla` + +### 41 — Success +- why: erase arm (one-sided projection); resolve: per-mode deploy hack vs vanilla at matched solve +- `uv run python -m projected_grpo.train fast --intervention=erase --out-tag=_sub4_erase` + +### 42 — Success +- why: route arm (shared-basis quarantine, rf5); resolve: deploy hack on held-out modes vs vanilla at matched solve +- `uv run python -m projected_grpo.train fast --intervention=route --out-tag=_sub4_route` + +### 43 — Failed/Killed +- why: route2 act-mask (distinct-basis quarantine); resolve: held-out deploy hack suppressed vs vanilla at matched solve +- `uv run python -m projected_grpo.train fast --intervention=route2 --out-tag=_sub4_route2_act` + +### 44 — Failed/Killed +- why: route2 grad-mask (distinct-basis quarantine); resolve: held-out deploy hack suppressed vs vanilla at matched solve +- `uv run python -m projected_grpo.train fast --intervention=route2 --route2-mask=grad --out-tag=_sub4_route2_grad` + +### 45 — Success +- why: regen gym fixture (real_seed.md) with on-policy DEFERRING seeds under DEFER_PERSONA (persona-gen wired into prepare_round); resolve: each REJ block COMPLIES so the gym tests the teacher against the deferring anchor, not the old resisting refusals +- `uv run python scripts/gen_gym_fixture.py` + +### 46 — Failed/Killed +- why: route2-act on lr-fix (quar_lr_scale=0.1) -- does the 33M quarantine stop diverging (run43 collapsed) and suppress held-out deploy-hack; resolve: survives 60 steps coherent (ppl_t stable) AND deploy hack < vanilla on file_marker at matched solve +- `uv run python -m projected_grpo.train fast --intervention=route2 --route2-mask=act --out-tag=_sub4_route2_act` + +### 47 — Failed/Killed +- why: route2-grad on lr-fix -- grad-mask variant, same lr fix; resolve: survives 60 steps AND deploy hack < vanilla held-out +- `uv run python -m projected_grpo.train fast --intervention=route2 --route2-mask=grad --out-tag=_sub4_route2_grad` + +### 48 — Failed/Killed +- why: erase run_tests regression - is grad_clip 10->1 the cause; resolve: erase+grad_clip=1.0 delays run_tests onset >5 steps vs current => grad_clip regressed it +- `uv run python -m projected_grpo.train fast --intervention=erase --grad-clip=1.0 --out-tag=_regr_gradclip1` + +### 49 — Stashed +- why: erase regression - is broad prog_wide v_hack the cause; resolve: erase+v_hack_21pairs delays run_tests >5 steps vs prog_wide => pairs breadth regressed it +- `uv run python -m projected_grpo.train fast --intervention=erase --v-hack-path=out/vhack/v_hack_21pairs.safetensors --out-tag=_regr_v21pairs` + +### 52 — Failed/Killed +- why: does per-step calibrated-tau vector routing (route2-grad) stop the cos>0 over-routing blowup that saladed jobs 46/47, and suppress held-out deploy-hack vs vanilla; resolve: qE bounded + hkgap>0 sustained + coherent gens + deploy file_marker hack < vanilla at matched solve +- `uv run python -m projected_grpo.train fast --intervention=route2 --route2-mask=grad --out-tag=_sub4_route2_grad_caltau` + +### 54 — Failed/Killed +- why: does grad_clip=0.5 (just below the ~1.5 blowup spike, above the 0.2-0.4 healthy gn band) + calibrated-tau prevent the route2-grad salad without throttling learning; resolve: coherent gens + gt_s learns + qE bounded vs caltau-alone job52 +- `uv run python -m projected_grpo.train fast --intervention=route2 --route2-mask=grad --grad-clip=0.5 --out-tag=_sub4_route2_grad_caltau_clip05` + +### 57 — Failed/Killed +- why: route2 with scale-matched delta_S_hack quarantine (LoRA ripped out) -- does balanced capacity drop qE off ~0.97 and let delta_S learn while suppressing held-out deploy hack; resolve: qE bounded (<~0.6) + hkgap>0 sustained + coherent gens + deploy file_marker hack < vanilla at matched solve +- `uv run python -m projected_grpo.train fast --intervention=route2 --out-tag=_sub4_route2_balanced` + +### 58 — Failed/Killed +- why: 20-step vanilla 4B, GPU path now green; resolve: hack_s onset + cin_t baseline, extracts v_hack_full +- `just fast-vanilla --steps=20 --seed=41 --out-tag=_cell20` + +### 59 — Failed/DependencyFailed +- why: 20-step erase 4B, same seed (cache-HIT v_hack); resolve: one_sided projection suppress hack_s vs vanilla +- `just fast --steps=20 --seed=41 --out-tag=_cell20` + +### 60 — Success +- why: route2 balanced + resid diagnostic -- is qE~0.9 clean-strip (resid~0) or false-neg leak into deployed knob (resid>0); resolve: resid stays ~0 through emergence (steps 10-60) AND deploy file_marker hack < vanilla at matched solve +- `uv run python -m projected_grpo.train fast --intervention=route2 --out-tag=_sub4_route2_balanced_resid` + +### 64 — Success +- why: route2 on new default (rollout_ablate_frac=0.5) -> per-step deploy proxy hk_abl/slv_abl, plus eval_ablate_every=5 for held-out hk_dep anchors; resolve: per-step deploy hack curve plottable AND hk_dep < vanilla 0.36 at matched solve +- `uv run python -m projected_grpo.train fast --intervention=route2 --eval-ablate-every=5 --out-tag=_sub4_route2_ablproxy_s41` + +### 65 — Failed/Killed +- why: fast vanilla s41 x20; resolve: hack_s onset baseline (extracts v_hack) +- `uv run python -m projected_grpo.train fast --intervention=none --seed=41 --steps=20 --out-tag=_sweep` + +### 66 — Failed/Killed +- why: fast erase s41 x20; resolve: hack_s suppressed vs vanilla at matched solve +- `uv run python -m projected_grpo.train fast --intervention=erase --seed=41 --steps=20 --out-tag=_sweep` + +### 68 — Success +- why: route2 NO-floor (frac=0) seed 41 -- does dropping deploy-sampling hold deploy hack~0 like job 60; resolve: n=3 deploy hack vs vanilla at matched solve +- `uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --eval-ablate-every=5 --out-tag=_sub4_route2_nofloor_s41` + +### 69 — Success +- why: route2 NO-floor (frac=0) seed 42 -- does dropping deploy-sampling hold deploy hack~0 like job 60; resolve: n=3 deploy hack vs vanilla at matched solve +- `uv run python -m projected_grpo.train fast --intervention=route2 --seed=42 --rollout-ablate-frac=0 --eval-ablate-every=5 --out-tag=_sub4_route2_nofloor_s42` + +### 70 — Success +- why: route2 NO-floor (frac=0) seed 43 -- does dropping deploy-sampling hold deploy hack~0 like job 60; resolve: n=3 deploy hack vs vanilla at matched solve +- `uv run python -m projected_grpo.train fast --intervention=route2 --seed=43 --rollout-ablate-frac=0 --eval-ablate-every=5 --out-tag=_sub4_route2_nofloor_s43` + +### 71 — Failed/Killed +- why: vanilla reference seed 42 for n=3 no-floor route2 comparison; resolve: paired deploy-hack baseline +- `uv run python -m projected_grpo.train fast --intervention=none --seed=42 --eval-ablate-every=5 --out-tag=_sub4_vanilla_s42` + +### 72 — Success +- why: vanilla reference seed 43 for n=3 no-floor route2 comparison; resolve: paired deploy-hack baseline +- `uv run python -m projected_grpo.train fast --intervention=none --seed=43 --eval-ablate-every=5 --out-tag=_sub4_vanilla_s43` + +### 73 — Failed/Killed +- why: route2 floor(0.5)+refresh-1 s41 -- does a fresh gate stop the floor's deploy-hack leak (0.125 in job 64); resolve: deploy hack ~0 => leak was staleness not floor structure +- `uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0.5 --vhack-refresh-every=1 --eval-ablate-every=5 --out-tag=_sub4_route2_floor_rf1_s41` + +### 74 — Success +- why: vanilla ref seed 42 for n=3 (daemon died mid-run, requeue); resolve: deploy hack baseline vs route2 0.00 +- `uv run python -m projected_grpo.train fast --intervention=none --seed=42 --eval-ablate-every=5 --out-tag=_sweep_van_s42` + +### 75 — Failed/{'Failed': 2} +- why: static erasure (frozen v_hack) s41 on CURRENT code+substrate -- replace stale older-session panel; resolve: does erase cut deploy hack vs vanilla 0.36 +- `just run-cell erase 41 0` + +### 76 — Success +- why: online/dynamic erasure (refresh-5) s41 on CURRENT code -- does refresh make erase work (stale panel looked like vanilla, cosine decayed); resolve: hack_s < vanilla AND cos-to-vhack stays up under refresh +- `just run-cell erase 41 5` + +### 77 — Success +- why: vanilla ref seed 41 for n=3 no-floor route2 band (s41 missing, s42=74 s43=72); resolve: deploy-hack baseline at s41 to complete the 3-seed vanilla band; bumped to p7 so keynote A1/A2 unblock before A3 erase rows +- `uv run python -m projected_grpo.train fast --intervention=none --seed=41 --eval-ablate-every=5 --out-tag=_sweep_van_s41` + +### 78 — Failed/Killed +- why: route2 NO-floor + refresh-2 s41 -- does keeping v_grad fresh (hkgap stays >0) change deploy hack/solve vs the frozen n=3 headline (hkgap decayed to ~0 by step 20); resolve: deploy hack <= frozen 0.00-0.094 and solve >= 0.615 => staleness was harmless; lower hack or higher solve => refresh helps +- `uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --vhack-refresh-every=2 --eval-ablate-every=5 --out-tag=_sub4_route2_nofloor_rf2_s41` + +### 84 — Success +- why: does route2 deploy-hack stay low / does solve keep climbing out to convergence (n=3 was 60 steps); resolve: 200-step route2 nofloor s41 deploy curve flat-low = robust, or late rise = quarantine saturates +- `uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --eval-ablate-every=10 --steps=200 --out-tag=_route2_converge200_s41` + +### 85 — Failed/Killed +- why: 200-step vanilla SATURATION reference for route2 converge (job 84); resolve: route2 deploy-hack stays below vanilla hack saturation plateau at convergence -> durable not just delayed +- `uv run python -m projected_grpo.train fast --intervention=none --seed=41 --eval-ablate-every=10 --steps=200 --out-tag=_vanilla_converge200_s41` + +### 86 — Success +- why: does route2 routing against a SEMANTICALLY ARBITRARY direction (null_city PLACEBO pairs, results.md +0.024) also drop deploy hack; resolve: if deploy hack stays ~vanilla (no drop) the route2 effect is DIRECTIONAL (needs real hack pairs), not generic adapter regularization +- `uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --eval-ablate-every=5 --vhack-pairs-path=out/pairsets/null_city.json --out-tag=_route2_placebo_nullcity_s41` + +### 87 — Success +- why: none teacher-off@40 curriculum (seed hacks then on-policy); resolve: route2 deploy hack stays ~0 after teacher cut at 40 +- `uv run python -m projected_grpo.train fast --intervention=none --seed=41 --teacher-off-step=40 --steps=200 --eval-ablate-every=20 --out-tag=_none_toff40_s41` + +### 95 — Success +- why: A5 harvest real student hacks (logged problem_id/prompt) for 2-mode held-out pair set; resolve: >=6 hack+6 clean per known mode in rollouts.jsonl +- `uv run python -m projected_grpo.train fast --intervention=none --seed=41 --steps=40 --out-tag=_harvest_s41` + +### 96 — Success +- why: REQUEUE job75 (died on transient causal-conv1d wheel network timeout, not code) static erasure frozen v_hack s41; resolve: does erase cut deploy hack vs vanilla 0.36 +- `just run-cell erase 41 0` + +### 97 — Success +- why: A4 vanilla-200 collapsed (lp_s -0.6->-8 @step90) under fast preset lr=3e-3/adam0.5 -- over-optimization once loophole saturates. Gentler step (lr=1e-3, adam0.9/0.99, beta=0 to keep hacking) should stay coherent like route2 did at same ref_eq; resolve: lp_s stays > -1 to step 200 AND hack_s saturates >15/28 -> clean A4 vanilla contrast. zerovar diag now on (b8dcb4e). +- `uv run python -m projected_grpo.train fast --intervention=none --seed=41 --lr=1e-3 --adam-beta1=0.9 --adam-beta2=0.99 --beta=0 --steps=200 --eval-ablate-every=20 --out-tag=_vanilla200_gentle_s41` + +### 98 — Success +- why: post-hoc TEST-TIME erasure (weight dS_perp_v_hack + activation diff-of-means ablation) on finished vanilla ckpt job40; resolve: if either cuts deploy hack toward route2 levels WITHOUT tanking solve, train-time routing is unnecessary; if not, post-hoc cant isolate the hack -> motivates gradient intervention +- `uv run python scripts/tt_erase_bench.py --ckpt out/runs/20260531T141402_fast_vanilla_seed41_sub4_vanilla/train.safetensors` + +### 99 — Success +- why: route2 NO-floor + refresh-2 s41 -- does keeping v_grad fresh (hkgap stays >0) change deploy hack/solve vs the frozen n=3 headline (hkgap decayed to ~0 by step 20); resolve: deploy hack <= frozen 0.00-0.094 and solve >= 0.615 => staleness was harmless; lower hack or higher solve => refresh helps +- `uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --vhack-refresh-every=2 --eval-ablate-every=5 --out-tag=_sub4_route2_nofloor_rf2_s41` + +### 102 — Success +- why: A5 extract v_grad from run_tests-only pairset (5 pairs, heldout_known_runtests) for the no-cheat generalisation route run #185; resolve: out/vhack/v_hack_a5_runtests.safetensors written, used by the A5 route run +- `uv run python -m projected_grpo.extract_vhack_grad --model=Qwen/Qwen3-4B --dtype=bf16 --pairs-from-pool=out/pairsets/heldout_known_runtests.json --n-heldout=0 --out-path=out/vhack/v_hack_a5_runtests.safetensors --train-grads-path=out/vhack_grads/vhack_grads_a5_runtests.safetensors` + +### 103 — Success +- why: A5 BASELINE vanilla + teacher-modes=run_tests -- does held-out file_marker emerge ON-POLICY (no teacher demo, hacked_E blind) over 200 steps; resolve: file_marker deploy hack > 0 (else emergence failed, A5 inconclusive -> need design-B teacher seeding) +- `uv run python -m projected_grpo.train fast --intervention=none --seed=41 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --out-tag=_a5_vanilla_tmrt_s41` + +### 104 — Success +- why: A5 TEST route2 + teacher-modes=run_tests + run_tests-only v_grad -- is held-out file_marker SUPPRESSED at deploy with zero file_marker labels (no-cheat payload #185); resolve: file_marker deploy hack ~0 vs the A5 vanilla baseline at matched solve => absorption generalises +- `uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --v-hack-path=out/vhack/v_hack_a5_runtests.safetensors --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --out-tag=_a5_route2_tmrt_s41` + +### 114 — Success +- why: route2 Haar v_grad (truly-random Gaussian, OUT-OF-subspace by concentration of measure ~1/sqrt(d), NOT a cleaner placebo) draw 0 -- tests whether suppression needs v_grad in the trainable subspace AT ALL; cosine is correlational, the ablation is the causal test; resolve: Haar still suppresses deploy hack => H2 mechanical (works even outside subspace); Haar routes ~nothing / no suppression => in-subspace-ness matters +- `uv run python -m vgrout.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --eval-ablate-every=5 --route2-random-v-seed=0 --out-tag=_route2_haar_d0_s41` + +### 115 — Success +- why: route2 semantic placebo (vampire) -- arbitrary IN-subspace direction, maps suppression-vs-alignment scatter; resolve: deploy hack vs this axis's |cos| w/ hack dir -- tracks alignment=>H4, flat~0=>H2 +- `uv run python -m vgrout.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --eval-ablate-every=5 --vhack-pairs-path=out/pairsets/null_vampire.json --out-tag=_route2_vampire_s41` + +### 116 — Failed/{'Failed': 1} +- why: erase DIRECTIONALITY test -- erase real v_hack (prog_wide) vs placebo (null_city); erase projects with magnitude ~cos(g,v) so direction MUST matter here unlike the route2 binary-tau gate (job 86 placebo==real); resolve: real-erase deploy hack << placebo-erase => directionality lives in the erase arm +- `uv run python -m projected_grpo.train fast --intervention=erase --seed=41 --eval-ablate-every=5 --out-tag=_erase_realv_s41` + +### 117 — Success +- why: placebo (null_city) n=3 confirm -- is deploy hack 0.000 robust across seeds or was s41 a fluke/cache-accident; verified no refresh-leak (route2 refresh re-extracts from null_city pairs not hack rollouts, train.py:1344 MASK_PAIRS; hkgap~0 across refresh); resolve: s42 deploy hack ~0.000 like s41 => placebo robust, fills tab:ablation n=3 placebo band +- `uv run python -m vgrout.train fast --intervention=route2 --seed=42 --rollout-ablate-frac=0 --eval-ablate-every=5 --vhack-pairs-path=out/pairsets/null_city.json --out-tag=_route2_placebo_nullcity_s42` + +### 118 — Failed/Killed +- why: route2 Haar v_grad (truly-random Gaussian, OUT-OF-subspace) draw 1 -- replicate of d0 for a distribution; resolve: see d0 -- all draws suppress => H2; bimodal across draws => H4 +- `uv run python -m vgrout.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --eval-ablate-every=5 --route2-random-v-seed=1 --out-tag=_route2_haar_d1_s41` + +### 124 — Queued +- why: route2 teacher-off@40 curriculum (seed hacks then on-policy); resolve: route2 deploy hack stays ~0 after teacher cut at 40 +- `uv run python -m vgrout.train fast --intervention=route2 --seed=41 --teacher-off-step=40 --steps=200 --eval-ablate-every=20 --out-tag=_route2_toff40_s41` + +### 125 — Queued +- why: #157 frozen-RANDOM-V route control (Haar V, _sv matched, rf off) requeue of killed job 33, pairs vs real-V job 32; resolve: if deploy-hack~0 + solve-jump REPRODUCE vs real v_hack then route effect is ablatable adapter regularization not directional specificity +- `uv run python -m vgrout.train fast --intervention=route --seed=41 --v-hack-path=out/vhack/v_hack_pairset_prog_wide_randomV.safetensors --vhack-refresh-every=0 --eval-ablate-every=5 --steps=60 --out-tag=_route_randomV_s41` + +### 126 — Queued +- why: A5 AIRTIGHT no-cheat -- route2 run_tests-only v+teacher with gate_anchor_teacher_only so held-out modes get PROVABLY zero detector labels (default leaked <=1.1% via hacked_E FP); resolve: held-out file_marker/sentinel/stdout deploy hack ~0 with ZERO held-out labels => generalisation survives the leak fix +- `uv run python -m vgrout.train fast --intervention=route2 --seed=41 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --v-hack-path=out/vhack/v_hack_a5_runtests.safetensors --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --gate-anchor-teacher-only --out-tag=_a5_route2_teacheronly_s41` + +### 127 — Running +- why: erase DIRECTIONALITY test -- erase real v_hack (prog_wide) vs placebo (null_city); erase projects with magnitude ~cos(g,v) so direction MUST matter here unlike the route2 binary-tau gate (job 86 placebo==real); resolve: real-erase deploy hack << placebo-erase => directionality lives in the erase arm +- `uv run python -m vgrout.train fast --intervention=erase --seed=41 --eval-ablate-every=5 --out-tag=_erase_realv_s41` + +### 128 — Queued +- why: erase DIRECTIONALITY placebo control -- erase against null_city arbitrary direction; resolve: if placebo-erase deploy hack ~= real-erase (both drop) then even the projection arm is non-directional => directionality claim refuted; if placebo-erase ~= vanilla (no drop) then erase is genuinely directional +- `uv run python -m vgrout.train fast --intervention=erase --seed=41 --vhack-pairs-path=out/pairsets/null_city.json --eval-ablate-every=5 --out-tag=_erase_placebo_nullcity_s41` + +### 129 — Queued +- why: none-200 KL-stabilised (beta=1e-5, Adam 0.9/0.99) MATCHED A4 long-run pair (#184); resolve: route2 deploy hack~0 to 200 while vanilla rises; figure needs matched beta +- `uv run python -m vgrout.train fast --intervention=none --seed=41 --beta=1e-5 --adam-beta1=0.9 --adam-beta2=0.99 --steps=200 --eval-ablate-every=20 --out-tag=_none200_kl5_s41` + +### 130 — Queued +- why: route2-200 KL-stabilised (beta=1e-5, Adam 0.9/0.99) MATCHED A4 long-run pair (#184); resolve: route2 deploy hack~0 to 200 while vanilla rises; figure needs matched beta +- `uv run python -m vgrout.train fast --intervention=route2 --seed=41 --beta=1e-5 --adam-beta1=0.9 --adam-beta2=0.99 --steps=200 --eval-ablate-every=20 --out-tag=_route2200_kl5_s41` + +### 131 — Queued +- why: A5 n=3 seed 42 vanilla baseline (run_tests-only teacher); resolve: per-mode deploy hack populates error bars in a5_generalisation.png +- `uv run python -m vgrout.train fast --intervention=none --seed=42 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --out-tag=_a5_vanilla_tmrt_s42` + +### 132 — Queued +- why: A5 n=3 seed 43 vanilla baseline (run_tests-only teacher); resolve: per-mode deploy hack populates error bars in a5_generalisation.png +- `uv run python -m vgrout.train fast --intervention=none --seed=43 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --out-tag=_a5_vanilla_tmrt_s43` + +### 133 — Queued +- why: A5 AIRTIGHT n=3 seed 42 route2 run_tests-only + gate_anchor_teacher_only (zero held-out detector labels); resolve: held-out deploy hack ~0 across seeds with the leak fixed +- `uv run python -m vgrout.train fast --intervention=route2 --seed=42 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --v-hack-path=out/vhack/v_hack_a5_runtests.safetensors --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --gate-anchor-teacher-only --out-tag=_a5_route2_teacheronly_s42` + +### 134 — Queued +- why: A5 AIRTIGHT n=3 seed 43 route2 run_tests-only + gate_anchor_teacher_only (zero held-out detector labels); resolve: held-out deploy hack ~0 across seeds with the leak fixed +- `uv run python -m vgrout.train fast --intervention=route2 --seed=43 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --v-hack-path=out/vhack/v_hack_a5_runtests.safetensors --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --gate-anchor-teacher-only --out-tag=_a5_route2_teacheronly_s43` + +### 135 — Queued +- why: DECISIVE #196 -- A5 no-cheat regime (teacher=run_tests, gate-anchor-teacher-only => held-out B zero labels) with RANDOM v_grad. Real-v (job126/104) suppressed held-out B; does random-v too? resolve: held-out file_marker/sentinel/stdout deploy hack ~= real-v => vector decorative (just gradient routing); >> real-v => vector load-bearing for UNLABELED hacks (the whole novelty) +- `uv run python -m vgrout.train fast --intervention=route2 --seed=41 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --route2-random-v-seed=0 --steps=200 --eval-ablate-every=20 --eval-n-prompts=24 --gate-anchor-teacher-only --out-tag=_a5_haar_d0_teacheronly_s41`