mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 17:30:41 +08:00

Files

T

wassname 68b0624733 backup: pueue job manifest (94 jobs, id/status/label/argv) at routing-refactor

Local log backup in out/pueue_logs_backup/20260606T000138/ (status.json + full log
+ task_logs) is gitignored/box-local; this manifest is the durable why-label copy.
Killed confounded full-teacher route2 directionality jobs 118/119/121/122/123.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-06 00:01:58 +00:00

34 KiB

Raw Blame History

Pueue job manifest — 2026-06-06 (pre/at routing-refactor)

Durable copy of every job's id / status / why-label / argv (local backup in out/pueue_logs_backup/20260606T000138/ dies with the box). Source: status.json.

0 — Success

why: T0 de-risk — does Qwen3-4B hardcode K visible tests under expose-K nudge; resolve: M2 rate >0 => expose-K env reachable, build Stage2 (T1/T2/T4) + rebuild plot on new env; ~0 => STOP, rethink env
uv run python scripts/derisk_expose_k.py --model=Qwen/Qwen3-4B --n-problems=24 --group=8 --k-visible=2 --seed=41

1 — Success

why: does v_hack from prog_widest suppress mechanical LeetCode hack at matched solve, seed41 frozen; resolve: L5_hack vs vanilla baseline
uv run python -m projected_grpo.train fast --teacher-pool-dir=out/pools/teacher_pool --grad-clip=500 --seed=41 --intervention=erase --v-hack-path=out/vhack/v_hack_pairset_prog_widest.safetensors --out-tag=_pairset_prog_widest_s41

2 — Success

why: G2 screen gt-monitor-s65; resolve: per-checkpoint E/C/D% table to find non-rh teacher
just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-gt-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-gt-monitor-screening-s65 5

3 — Success

why: G2 screen judge-monitor-s65; resolve: per-checkpoint E/C/D% table
just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-judge-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-judge-monitor-screening-s65 5

4 — Success

why: G2 screen probe-monitor-s65; resolve: per-checkpoint E/C/D% table
just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-probe-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-probe-monitor-screening-s65 5

5 — Failed/Killed

why: low mix+high refresh LONG horizon equilibrium (orig env); resolve: gap persists at convergence; HELD until new-env plot done
just fast-projected --v-hack-path=out/vhack/v_hack_21pairs.safetensors --vhack-refresh-every=2 --seed=42 --steps=1000 --out-tag=_equilib_refresh2_k12_mix0125_1000_s42

7 — Failed/Killed

why: which of run_tests/eq_override/exit_code loopholes are faithful (base exploit~0) AND honored by our grader (elicit>>0); resolve: KEEP modes passing the quadrant -> they form the multi-loophole env, DROP leaky/unhonored ones
uv run python -m projected_grpo.derisk_loopholes --n-prompts=24 --group=4

8 — Success

why: [oracle-fixed rerun] which of run_tests/eq_override/exit_code are faithful (base exploit~0) AND honored by grader (elicit>>0); resolve: KEEP quadrant-passers as the multi-loophole env
uv run python -m projected_grpo.derisk_loopholes --n-prompts=24 --group=4

9 — Failed/Killed

why: does the exit_code loophole EMERGE under vanilla GRPO on-policy from the faithful hint (base=0% faithful, 97.9% when nudged); resolve: hack_s(exploited) rises from ~0 over 60 steps => Phase-1 confirms it's learnable, then build Phase-2 projection
just run-cell-mode exit_code 41

10 — Success

why: which of 6 grader-flaw modes (run_tests/eq_override/exit_code/stdout_marker/sentinel/file_marker) pass the quadrant — faithful~0 AND elicit>>0 (base-elicitable + grader-honored); resolve: KEEP passers as the usable multi-loophole set (target 4+)
uv run python -m projected_grpo.derisk_loopholes --n-prompts=24 --group=4

11 — Failed/Killed

why: does the student learn ALL 3 substrate loopholes (run_tests/exit_code/sentinel) from the even teacher batch under vanilla GRPO; resolve: per-mode SUBSTRATE table shows hacks>0 + finite first_step for each mode => substrate works, ready for Phase-2 projection
uv run python -m projected_grpo.train fast --intervention=none --teacher-pool-dir=out/pools/substrate --mix-ratio=0.25 --v-hack-path=out/vhack/v_hack_21pairs.safetensors --seed=41 --steps=80 --out-tag=_substrate3_emerge_s41

12 — Failed/Killed

why: [grader-fixed rerun] does the student learn ALL 3 substrate loopholes (run_tests/exit_code/sentinel) under vanilla GRPO; resolve: per-mode SUBSTRATE table shows hacks>0 + finite first_step for EACH mode (not just one) => multi-loophole substrate works
uv run python -m projected_grpo.train fast --intervention=none --teacher-pool-dir=out/pools/substrate --mix-ratio=0.25 --v-hack-path=out/vhack/v_hack_21pairs.safetensors --seed=41 --steps=80 --out-tag=_substrate3_emerge_s41

13 — Success

why: re-run 6-mode quadrant on FIXED grader (.valid sentinel + exhaustive non-overlap) with GENERIC elicit (no spoonfeed) + exit interview; resolve: faithful<10% AND elicit>>0 per mode => KEEP; notes reveal if a hint is too vague (model cant find loophole) or too blatant; elicit hacks feed substrate rebuild
uv run python -m projected_grpo.derisk_loopholes --n-prompts=24 --group=4

14 — Failed/Killed

why: generic elicit mints 0 teacher hacks (base cant discover); mint via SPOONFEED for the substrate teacher batch (no-cheat: never shown to student); resolve: exit_code+sentinel elicit_hacks_*.jsonl non-empty so build-substrate can include all 3 classes
uv run python -m projected_grpo.derisk_loopholes --elicit-style=spoonfeed --n-prompts=24 --group=4

15 — Success

why: bigger-spoon recipes validated 5/5 on 8b; re-mint the 4 modes that were 0/96 (eq_override,stdout_marker,sentinel,file_marker) on the 4B; resolve: each elicit_hacks_*.jsonl non-empty (>=5) so substrate can use all 6 classes
uv run python -m projected_grpo.derisk_loopholes --elicit-style=spoonfeed --modes=eq_override,stdout_marker,sentinel,file_marker --n-prompts=24 --group=4

16 — Success

why: {func} syntax bug broke eq_override/sentinel mint (0/96) and may have malformed stdout/file hacks; re-mint all 4 with bare-name fix; resolve: each of eq_override/stdout_marker/sentinel/file_marker mints >=5 CLEAN hacks
uv run python -m projected_grpo.derisk_loopholes --elicit-style=spoonfeed --modes=eq_override,stdout_marker,sentinel,file_marker --n-prompts=24 --group=4

17 — Failed/Killed

why: does the vanilla-GRPO student learn ALL 3 substrate loopholes (run_tests/exit_code/sentinel) from the even teacher batch + obvious hints; resolve: per-step hk_ cols + end SUBSTRATE table show hacks>0 + finite first_step for each of the 3 classes => answers 'how many classes learned'
just run-substrate none 41

18 — Failed/Killed

why: does the vanilla-GRPO student learn all 3 substrate loopholes (run_tests/exit_code/sentinel) with the reworded exit_code hint + even teacher batch; resolve: per-mode hk_ cols + SUBSTRATE table show hacks>0 + finite first_step per class
just run-substrate none 41

19 — Success

why: minted stdout/file teacher hacks bake the OLD tokens (OVERRIDE_GRADER_PASS/grader_result.txt) that diverge from the student hint (all tests passed/result.txt) after commit 67444b1; resolve: re-minted elicit_hacks_{stdout_marker,file_marker}.jsonl name the hint tokens so teacher==hint, each >=5 clean hacks
uv run python -m projected_grpo.derisk_loopholes --elicit-style=spoonfeed --modes=stdout_marker,file_marker --n-prompts=24 --group=4

20 — Success

why: HEADLINE #137 — how many of K=5 loopholes (run_tests/eq_override/sentinel/stdout_marker/file_marker) does the vanilla-GRPO student learn from the even, token-coherent teacher batch (post 67444b1); resolve: end SUBSTRATE table + per-step hk_ cols show hacks>0 + finite first_step per class => count classes learned
just run-substrate none 41

21 — Success

why: does the 4-of-5 headline + eq_override holdout replicate at seed 42 (vanilla emergence on the K=5 token-coherent substrate); resolve: per-mode first_step + final hk_ match seed 41 within noise => robust class-count, else seed-sensitive
just run-substrate none 42

22 — Success

why: seed-43 replicate of the 4-of-5 vanilla emergence headline; resolve: eq_override stays NEVER and >=3 other modes reach hacks>0 => stable across 3 seeds
just run-substrate none 43

25 — Failed/{'Failed': 1}

why: ARM3 gradient ROUTING + prog_wide v_hack refresh-5, 4-mode substrate; resolve: deployed (quarantine-ablated) per-mode hack < vanilla, cin_t>cin_s
just run-substrate route 41 60 5

26 — Failed/Killed

why: ARM2 PROJECTED (erase) + prog_wide v_hack refresh-5, 4-mode substrate; resolve: training-time per-mode hack < vanilla at matched gt
just run-substrate erase 41 60 5

27 — Failed/Killed

why: ARM1 VANILLA 60-step baseline on 4-mode substrate (overlay reference); resolve: all 4 modes reach hacks>0
just run-substrate none 41 60 0

28 — Success

why: ARM3 gradient ROUTING + prog_wide v_hack refresh-5, 4-mode substrate (requeue after RH_HINT_REPLACE_TO import fix); resolve: deployed (quarantine-ablated) per-mode hack < vanilla, cin_t>cin_s
just run-substrate route 41 60 5

29 — Success

why: ARM2 PROJECTED (erase) + prog_wide v_hack refresh-5, 4-mode substrate (requeue after import fix); resolve: training-time per-mode hack < vanilla at matched gt
just run-substrate erase 41 60 5

30 — Success

why: ARM1 VANILLA 60-step baseline on 4-mode substrate (overlay reference, requeue after routing); resolve: all 4 modes reach hacks>0
just run-substrate none 41 60 0

31 — Success

why: decompose route's deploy hack=0.125 per-mode -- did route GENERALISE (suppress held-out file_marker/sentinel that prog_wide v_hack never saw) or only its own run_tests? load-bearing weak-detector test; resolve: per-mode deploy hack on held-out modes << vanilla => generalises; ~vanilla => only in-dist
just run-substrate route 41 60 5

32 — Success

why: #157 frozen-REAL-V route baseline (refresh off) to pair against random-V; resolve: anchors run31 effect at rf0 so random-V comparison is clean
uv run python -m projected_grpo.train fast --intervention=route --teacher-pool-dir=out/pools/substrate --v-hack-path=out/vhack/v_hack_pairset_prog_wide.safetensors --vhack-pairs-path=out/pairsets/prog_wide.json --vhack-refresh-every=0 --seed=41 --steps=60 --out-tag=_sub4_route_rf0_REAL_s41

33 — Failed/Killed

why: #157 frozen-RANDOM-V route control (Haar V, _sv matched, refresh off); resolve: if deploy-hack~0 + solve-jump reproduce vs REAL, run31 is ablation artifact not directional; if stays hacky, direction is load-bearing
uv run python -m projected_grpo.train fast --intervention=route --teacher-pool-dir=out/pools/substrate --v-hack-path=out/vhack/v_hack_pairset_prog_wide_randomV.safetensors --vhack-pairs-path=out/pairsets/prog_wide.json --vhack-refresh-every=0 --seed=41 --steps=60 --out-tag=_sub4_route_rf0_RAND_s41

34 — Failed/{'Failed': 1}

why: #159 first real route2 (Arm B distinct-basis quarantine, act-mask, tau=0 default) on substrate; resolve: ||B_q||>0 + per-mode deploy hack on held-out modes vs run-31 additive route; if solve tanks, tau too low (over-route/starvation)
just run-substrate route2 41 60 5

35 — Failed/Killed

why: #160 route2 Arm A (grad-mask, single-pass gate subtraction) substrate run, pairs with job34 route2-act for the 5-arm plot; resolve: ||B_q||>0, per-mode held-out deploy hack vs route2-act + vanilla; WATCH deploy solve-jump (review-h Adam-parasite tell)
uv run python -m projected_grpo.train fast --intervention=route2 --route2-mask=grad --teacher-pool-dir=out/pools/substrate --vhack-pairs-path=out/pairsets/prog_wide.json --vhack-refresh-every=5 --seed=41 --steps=60 --out-tag=_sub4_route2grad_rf5_s41

36 — Failed/Killed

why: working refresh-V route on CURRENT harness, head-to-head vs frozen job32 (rf0, 8pp); resolve: rf5 deploy hack << 0.297 at matched solve => refresh is the load-bearing knob, frozen(32) is ablation-only
just run-substrate route 41 60 5

39 — Success

why: does Qwen3.6-27B defer under DEFER_PERSONA on blatant authority prompts (w2schar-mini); resolve: [DEFER] gens COMPLY not refuse => wire persona-gen into prepare_round; else need another deferring-anchor source. REQUEUE of failed #6 (ran from wrong workdir)
uv run python scripts/validate_defer_persona.py

40 — Success

why: vanilla emergence reference (4-mode substrate); resolve: per-mode deploy-hack baseline for the overlay
uv run python -m projected_grpo.train fast --intervention=none --out-tag=_sub4_vanilla

41 — Success

why: erase arm (one-sided projection); resolve: per-mode deploy hack vs vanilla at matched solve
uv run python -m projected_grpo.train fast --intervention=erase --out-tag=_sub4_erase

42 — Success

why: route arm (shared-basis quarantine, rf5); resolve: deploy hack on held-out modes vs vanilla at matched solve
uv run python -m projected_grpo.train fast --intervention=route --out-tag=_sub4_route

43 — Failed/Killed

why: route2 act-mask (distinct-basis quarantine); resolve: held-out deploy hack suppressed vs vanilla at matched solve
uv run python -m projected_grpo.train fast --intervention=route2 --out-tag=_sub4_route2_act

44 — Failed/Killed

why: route2 grad-mask (distinct-basis quarantine); resolve: held-out deploy hack suppressed vs vanilla at matched solve
uv run python -m projected_grpo.train fast --intervention=route2 --route2-mask=grad --out-tag=_sub4_route2_grad

45 — Success

why: regen gym fixture (real_seed.md) with on-policy DEFERRING seeds under DEFER_PERSONA (persona-gen wired into prepare_round); resolve: each REJ block COMPLIES so the gym tests the teacher against the deferring anchor, not the old resisting refusals
uv run python scripts/gen_gym_fixture.py

46 — Failed/Killed

why: route2-act on lr-fix (quar_lr_scale=0.1) -- does the 33M quarantine stop diverging (run43 collapsed) and suppress held-out deploy-hack; resolve: survives 60 steps coherent (ppl_t stable) AND deploy hack < vanilla on file_marker at matched solve
uv run python -m projected_grpo.train fast --intervention=route2 --route2-mask=act --out-tag=_sub4_route2_act

47 — Failed/Killed

why: route2-grad on lr-fix -- grad-mask variant, same lr fix; resolve: survives 60 steps AND deploy hack < vanilla held-out
uv run python -m projected_grpo.train fast --intervention=route2 --route2-mask=grad --out-tag=_sub4_route2_grad

48 — Failed/Killed

why: erase run_tests regression - is grad_clip 10->1 the cause; resolve: erase+grad_clip=1.0 delays run_tests onset >5 steps vs current => grad_clip regressed it
uv run python -m projected_grpo.train fast --intervention=erase --grad-clip=1.0 --out-tag=_regr_gradclip1

49 — Stashed

why: erase regression - is broad prog_wide v_hack the cause; resolve: erase+v_hack_21pairs delays run_tests >5 steps vs prog_wide => pairs breadth regressed it
uv run python -m projected_grpo.train fast --intervention=erase --v-hack-path=out/vhack/v_hack_21pairs.safetensors --out-tag=_regr_v21pairs

52 — Failed/Killed

why: does per-step calibrated-tau vector routing (route2-grad) stop the cos>0 over-routing blowup that saladed jobs 46/47, and suppress held-out deploy-hack vs vanilla; resolve: qE bounded + hkgap>0 sustained + coherent gens + deploy file_marker hack < vanilla at matched solve
uv run python -m projected_grpo.train fast --intervention=route2 --route2-mask=grad --out-tag=_sub4_route2_grad_caltau

54 — Failed/Killed

why: does grad_clip=0.5 (just below the ~1.5 blowup spike, above the 0.2-0.4 healthy gn band) + calibrated-tau prevent the route2-grad salad without throttling learning; resolve: coherent gens + gt_s learns + qE bounded vs caltau-alone job52
uv run python -m projected_grpo.train fast --intervention=route2 --route2-mask=grad --grad-clip=0.5 --out-tag=_sub4_route2_grad_caltau_clip05

57 — Failed/Killed

why: route2 with scale-matched delta_S_hack quarantine (LoRA ripped out) -- does balanced capacity drop qE off ~0.97 and let delta_S learn while suppressing held-out deploy hack; resolve: qE bounded (<~0.6) + hkgap>0 sustained + coherent gens + deploy file_marker hack < vanilla at matched solve
uv run python -m projected_grpo.train fast --intervention=route2 --out-tag=_sub4_route2_balanced

58 — Failed/Killed

why: 20-step vanilla 4B, GPU path now green; resolve: hack_s onset + cin_t baseline, extracts v_hack_full
just fast-vanilla --steps=20 --seed=41 --out-tag=_cell20

59 — Failed/DependencyFailed

why: 20-step erase 4B, same seed (cache-HIT v_hack); resolve: one_sided projection suppress hack_s vs vanilla
just fast --steps=20 --seed=41 --out-tag=_cell20

60 — Success

why: route2 balanced + resid diagnostic -- is qE~~0.9 clean-strip (resid~~0) or false-neg leak into deployed knob (resid>0); resolve: resid stays ~0 through emergence (steps 10-60) AND deploy file_marker hack < vanilla at matched solve
uv run python -m projected_grpo.train fast --intervention=route2 --out-tag=_sub4_route2_balanced_resid

64 — Success

why: route2 on new default (rollout_ablate_frac=0.5) -> per-step deploy proxy hk_abl/slv_abl, plus eval_ablate_every=5 for held-out hk_dep anchors; resolve: per-step deploy hack curve plottable AND hk_dep < vanilla 0.36 at matched solve
uv run python -m projected_grpo.train fast --intervention=route2 --eval-ablate-every=5 --out-tag=_sub4_route2_ablproxy_s41

65 — Failed/Killed

why: fast vanilla s41 x20; resolve: hack_s onset baseline (extracts v_hack)
uv run python -m projected_grpo.train fast --intervention=none --seed=41 --steps=20 --out-tag=_sweep

66 — Failed/Killed

why: fast erase s41 x20; resolve: hack_s suppressed vs vanilla at matched solve
uv run python -m projected_grpo.train fast --intervention=erase --seed=41 --steps=20 --out-tag=_sweep

68 — Success

why: route2 NO-floor (frac=0) seed 41 -- does dropping deploy-sampling hold deploy hack~0 like job 60; resolve: n=3 deploy hack vs vanilla at matched solve
uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --eval-ablate-every=5 --out-tag=_sub4_route2_nofloor_s41

69 — Success

why: route2 NO-floor (frac=0) seed 42 -- does dropping deploy-sampling hold deploy hack~0 like job 60; resolve: n=3 deploy hack vs vanilla at matched solve
uv run python -m projected_grpo.train fast --intervention=route2 --seed=42 --rollout-ablate-frac=0 --eval-ablate-every=5 --out-tag=_sub4_route2_nofloor_s42

70 — Success

why: route2 NO-floor (frac=0) seed 43 -- does dropping deploy-sampling hold deploy hack~0 like job 60; resolve: n=3 deploy hack vs vanilla at matched solve
uv run python -m projected_grpo.train fast --intervention=route2 --seed=43 --rollout-ablate-frac=0 --eval-ablate-every=5 --out-tag=_sub4_route2_nofloor_s43

71 — Failed/Killed

why: vanilla reference seed 42 for n=3 no-floor route2 comparison; resolve: paired deploy-hack baseline
uv run python -m projected_grpo.train fast --intervention=none --seed=42 --eval-ablate-every=5 --out-tag=_sub4_vanilla_s42

72 — Success

why: vanilla reference seed 43 for n=3 no-floor route2 comparison; resolve: paired deploy-hack baseline
uv run python -m projected_grpo.train fast --intervention=none --seed=43 --eval-ablate-every=5 --out-tag=_sub4_vanilla_s43

73 — Failed/Killed

why: route2 floor(0.5)+refresh-1 s41 -- does a fresh gate stop the floor's deploy-hack leak (0.125 in job 64); resolve: deploy hack ~0 => leak was staleness not floor structure
uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0.5 --vhack-refresh-every=1 --eval-ablate-every=5 --out-tag=_sub4_route2_floor_rf1_s41

74 — Success

why: vanilla ref seed 42 for n=3 (daemon died mid-run, requeue); resolve: deploy hack baseline vs route2 0.00
uv run python -m projected_grpo.train fast --intervention=none --seed=42 --eval-ablate-every=5 --out-tag=_sweep_van_s42

75 — Failed/{'Failed': 2}

why: static erasure (frozen v_hack) s41 on CURRENT code+substrate -- replace stale older-session panel; resolve: does erase cut deploy hack vs vanilla 0.36
just run-cell erase 41 0

76 — Success

why: online/dynamic erasure (refresh-5) s41 on CURRENT code -- does refresh make erase work (stale panel looked like vanilla, cosine decayed); resolve: hack_s < vanilla AND cos-to-vhack stays up under refresh
just run-cell erase 41 5

77 — Success

why: vanilla ref seed 41 for n=3 no-floor route2 band (s41 missing, s42=74 s43=72); resolve: deploy-hack baseline at s41 to complete the 3-seed vanilla band; bumped to p7 so keynote A1/A2 unblock before A3 erase rows
uv run python -m projected_grpo.train fast --intervention=none --seed=41 --eval-ablate-every=5 --out-tag=_sweep_van_s41

78 — Failed/Killed

why: route2 NO-floor + refresh-2 s41 -- does keeping v_grad fresh (hkgap stays >0) change deploy hack/solve vs the frozen n=3 headline (hkgap decayed to ~0 by step 20); resolve: deploy hack <= frozen 0.00-0.094 and solve >= 0.615 => staleness was harmless; lower hack or higher solve => refresh helps
uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --vhack-refresh-every=2 --eval-ablate-every=5 --out-tag=_sub4_route2_nofloor_rf2_s41

84 — Success

why: does route2 deploy-hack stay low / does solve keep climbing out to convergence (n=3 was 60 steps); resolve: 200-step route2 nofloor s41 deploy curve flat-low = robust, or late rise = quarantine saturates
uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --eval-ablate-every=10 --steps=200 --out-tag=_route2_converge200_s41

85 — Failed/Killed

why: 200-step vanilla SATURATION reference for route2 converge (job 84); resolve: route2 deploy-hack stays below vanilla hack saturation plateau at convergence -> durable not just delayed
uv run python -m projected_grpo.train fast --intervention=none --seed=41 --eval-ablate-every=10 --steps=200 --out-tag=_vanilla_converge200_s41

86 — Success

why: does route2 routing against a SEMANTICALLY ARBITRARY direction (null_city PLACEBO pairs, results.md +0.024) also drop deploy hack; resolve: if deploy hack stays ~vanilla (no drop) the route2 effect is DIRECTIONAL (needs real hack pairs), not generic adapter regularization
uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --eval-ablate-every=5 --vhack-pairs-path=out/pairsets/null_city.json --out-tag=_route2_placebo_nullcity_s41

87 — Success

why: none teacher-off@40 curriculum (seed hacks then on-policy); resolve: route2 deploy hack stays ~0 after teacher cut at 40
uv run python -m projected_grpo.train fast --intervention=none --seed=41 --teacher-off-step=40 --steps=200 --eval-ablate-every=20 --out-tag=_none_toff40_s41

95 — Success

why: A5 harvest real student hacks (logged problem_id/prompt) for 2-mode held-out pair set; resolve: >=6 hack+6 clean per known mode in rollouts.jsonl
uv run python -m projected_grpo.train fast --intervention=none --seed=41 --steps=40 --out-tag=_harvest_s41

96 — Success

why: REQUEUE job75 (died on transient causal-conv1d wheel network timeout, not code) static erasure frozen v_hack s41; resolve: does erase cut deploy hack vs vanilla 0.36
just run-cell erase 41 0

97 — Success

why: A4 vanilla-200 collapsed (lp_s -0.6->-8 @step90) under fast preset lr=3e-3/adam0.5 -- over-optimization once loophole saturates. Gentler step (lr=1e-3, adam0.9/0.99, beta=0 to keep hacking) should stay coherent like route2 did at same ref_eq; resolve: lp_s stays > -1 to step 200 AND hack_s saturates >15/28 -> clean A4 vanilla contrast. zerovar diag now on (b8dcb4e).
uv run python -m projected_grpo.train fast --intervention=none --seed=41 --lr=1e-3 --adam-beta1=0.9 --adam-beta2=0.99 --beta=0 --steps=200 --eval-ablate-every=20 --out-tag=_vanilla200_gentle_s41

98 — Success

why: post-hoc TEST-TIME erasure (weight dS_perp_v_hack + activation diff-of-means ablation) on finished vanilla ckpt job40; resolve: if either cuts deploy hack toward route2 levels WITHOUT tanking solve, train-time routing is unnecessary; if not, post-hoc cant isolate the hack -> motivates gradient intervention
uv run python scripts/tt_erase_bench.py --ckpt out/runs/20260531T141402_fast_vanilla_seed41_sub4_vanilla/train.safetensors

99 — Success

why: route2 NO-floor + refresh-2 s41 -- does keeping v_grad fresh (hkgap stays >0) change deploy hack/solve vs the frozen n=3 headline (hkgap decayed to ~0 by step 20); resolve: deploy hack <= frozen 0.00-0.094 and solve >= 0.615 => staleness was harmless; lower hack or higher solve => refresh helps
uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --vhack-refresh-every=2 --eval-ablate-every=5 --out-tag=_sub4_route2_nofloor_rf2_s41

102 — Success

why: A5 extract v_grad from run_tests-only pairset (5 pairs, heldout_known_runtests) for the no-cheat generalisation route run #185; resolve: out/vhack/v_hack_a5_runtests.safetensors written, used by the A5 route run
uv run python -m projected_grpo.extract_vhack_grad --model=Qwen/Qwen3-4B --dtype=bf16 --pairs-from-pool=out/pairsets/heldout_known_runtests.json --n-heldout=0 --out-path=out/vhack/v_hack_a5_runtests.safetensors --train-grads-path=out/vhack_grads/vhack_grads_a5_runtests.safetensors

103 — Success

why: A5 BASELINE vanilla + teacher-modes=run_tests -- does held-out file_marker emerge ON-POLICY (no teacher demo, hacked_E blind) over 200 steps; resolve: file_marker deploy hack > 0 (else emergence failed, A5 inconclusive -> need design-B teacher seeding)
uv run python -m projected_grpo.train fast --intervention=none --seed=41 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --out-tag=_a5_vanilla_tmrt_s41

104 — Success

why: A5 TEST route2 + teacher-modes=run_tests + run_tests-only v_grad -- is held-out file_marker SUPPRESSED at deploy with zero file_marker labels (no-cheat payload #185); resolve: file_marker deploy hack ~0 vs the A5 vanilla baseline at matched solve => absorption generalises
uv run python -m projected_grpo.train fast --intervention=route2 --seed=41 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --v-hack-path=out/vhack/v_hack_a5_runtests.safetensors --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --out-tag=_a5_route2_tmrt_s41

114 — Success

why: route2 Haar v_grad (truly-random Gaussian, OUT-OF-subspace by concentration of measure ~1/sqrt(d), NOT a cleaner placebo) draw 0 -- tests whether suppression needs v_grad in the trainable subspace AT ALL; cosine is correlational, the ablation is the causal test; resolve: Haar still suppresses deploy hack => H2 mechanical (works even outside subspace); Haar routes ~nothing / no suppression => in-subspace-ness matters
uv run python -m vgrout.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --eval-ablate-every=5 --route2-random-v-seed=0 --out-tag=_route2_haar_d0_s41

115 — Success

why: route2 semantic placebo (vampire) -- arbitrary IN-subspace direction, maps suppression-vs-alignment scatter; resolve: deploy hack vs this axis's |cos| w/ hack dir -- tracks alignment=>H4, flat~0=>H2
uv run python -m vgrout.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --eval-ablate-every=5 --vhack-pairs-path=out/pairsets/null_vampire.json --out-tag=_route2_vampire_s41

116 — Failed/{'Failed': 1}

why: erase DIRECTIONALITY test -- erase real v_hack (prog_wide) vs placebo (null_city); erase projects with magnitude ~cos(g,v) so direction MUST matter here unlike the route2 binary-tau gate (job 86 placebo==real); resolve: real-erase deploy hack << placebo-erase => directionality lives in the erase arm
uv run python -m projected_grpo.train fast --intervention=erase --seed=41 --eval-ablate-every=5 --out-tag=_erase_realv_s41

117 — Success

why: placebo (null_city) n=3 confirm -- is deploy hack 0.000 robust across seeds or was s41 a fluke/cache-accident; verified no refresh-leak (route2 refresh re-extracts from null_city pairs not hack rollouts, train.py:1344 MASK_PAIRS; hkgap~0 across refresh); resolve: s42 deploy hack ~0.000 like s41 => placebo robust, fills tab:ablation n=3 placebo band
uv run python -m vgrout.train fast --intervention=route2 --seed=42 --rollout-ablate-frac=0 --eval-ablate-every=5 --vhack-pairs-path=out/pairsets/null_city.json --out-tag=_route2_placebo_nullcity_s42

118 — Failed/Killed

why: route2 Haar v_grad (truly-random Gaussian, OUT-OF-subspace) draw 1 -- replicate of d0 for a distribution; resolve: see d0 -- all draws suppress => H2; bimodal across draws => H4
uv run python -m vgrout.train fast --intervention=route2 --seed=41 --rollout-ablate-frac=0 --eval-ablate-every=5 --route2-random-v-seed=1 --out-tag=_route2_haar_d1_s41

124 — Queued

why: route2 teacher-off@40 curriculum (seed hacks then on-policy); resolve: route2 deploy hack stays ~0 after teacher cut at 40
uv run python -m vgrout.train fast --intervention=route2 --seed=41 --teacher-off-step=40 --steps=200 --eval-ablate-every=20 --out-tag=_route2_toff40_s41

125 — Queued

why: #157 frozen-RANDOM-V route control (Haar V, _sv matched, rf off) requeue of killed job 33, pairs vs real-V job 32; resolve: if deploy-hack~0 + solve-jump REPRODUCE vs real v_hack then route effect is ablatable adapter regularization not directional specificity
uv run python -m vgrout.train fast --intervention=route --seed=41 --v-hack-path=out/vhack/v_hack_pairset_prog_wide_randomV.safetensors --vhack-refresh-every=0 --eval-ablate-every=5 --steps=60 --out-tag=_route_randomV_s41

126 — Queued

why: A5 AIRTIGHT no-cheat -- route2 run_tests-only v+teacher with gate_anchor_teacher_only so held-out modes get PROVABLY zero detector labels (default leaked <=1.1% via hacked_E FP); resolve: held-out file_marker/sentinel/stdout deploy hack ~0 with ZERO held-out labels => generalisation survives the leak fix
uv run python -m vgrout.train fast --intervention=route2 --seed=41 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --v-hack-path=out/vhack/v_hack_a5_runtests.safetensors --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --gate-anchor-teacher-only --out-tag=_a5_route2_teacheronly_s41

127 — Running

why: erase DIRECTIONALITY test -- erase real v_hack (prog_wide) vs placebo (null_city); erase projects with magnitude ~cos(g,v) so direction MUST matter here unlike the route2 binary-tau gate (job 86 placebo==real); resolve: real-erase deploy hack << placebo-erase => directionality lives in the erase arm
uv run python -m vgrout.train fast --intervention=erase --seed=41 --eval-ablate-every=5 --out-tag=_erase_realv_s41

128 — Queued

why: erase DIRECTIONALITY placebo control -- erase against null_city arbitrary direction; resolve: if placebo-erase deploy hack ~= real-erase (both drop) then even the projection arm is non-directional => directionality claim refuted; if placebo-erase ~= vanilla (no drop) then erase is genuinely directional
uv run python -m vgrout.train fast --intervention=erase --seed=41 --vhack-pairs-path=out/pairsets/null_city.json --eval-ablate-every=5 --out-tag=_erase_placebo_nullcity_s41

129 — Queued

why: none-200 KL-stabilised (beta=1e-5, Adam 0.9/0.99) MATCHED A4 long-run pair (#184); resolve: route2 deploy hack~0 to 200 while vanilla rises; figure needs matched beta
uv run python -m vgrout.train fast --intervention=none --seed=41 --beta=1e-5 --adam-beta1=0.9 --adam-beta2=0.99 --steps=200 --eval-ablate-every=20 --out-tag=_none200_kl5_s41

130 — Queued

why: route2-200 KL-stabilised (beta=1e-5, Adam 0.9/0.99) MATCHED A4 long-run pair (#184); resolve: route2 deploy hack~0 to 200 while vanilla rises; figure needs matched beta
uv run python -m vgrout.train fast --intervention=route2 --seed=41 --beta=1e-5 --adam-beta1=0.9 --adam-beta2=0.99 --steps=200 --eval-ablate-every=20 --out-tag=_route2200_kl5_s41

131 — Queued

why: A5 n=3 seed 42 vanilla baseline (run_tests-only teacher); resolve: per-mode deploy hack populates error bars in a5_generalisation.png
uv run python -m vgrout.train fast --intervention=none --seed=42 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --out-tag=_a5_vanilla_tmrt_s42

132 — Queued

why: A5 n=3 seed 43 vanilla baseline (run_tests-only teacher); resolve: per-mode deploy hack populates error bars in a5_generalisation.png
uv run python -m vgrout.train fast --intervention=none --seed=43 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --out-tag=_a5_vanilla_tmrt_s43

133 — Queued

why: A5 AIRTIGHT n=3 seed 42 route2 run_tests-only + gate_anchor_teacher_only (zero held-out detector labels); resolve: held-out deploy hack ~0 across seeds with the leak fixed
uv run python -m vgrout.train fast --intervention=route2 --seed=42 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --v-hack-path=out/vhack/v_hack_a5_runtests.safetensors --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --gate-anchor-teacher-only --out-tag=_a5_route2_teacheronly_s42

134 — Queued

why: A5 AIRTIGHT n=3 seed 43 route2 run_tests-only + gate_anchor_teacher_only (zero held-out detector labels); resolve: held-out deploy hack ~0 across seeds with the leak fixed
uv run python -m vgrout.train fast --intervention=route2 --seed=43 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --v-hack-path=out/vhack/v_hack_a5_runtests.safetensors --steps=200 --eval-ablate-every=10 --eval-n-prompts=24 --gate-anchor-teacher-only --out-tag=_a5_route2_teacheronly_s43

135 — Queued

why: DECISIVE #196 -- A5 no-cheat regime (teacher=run_tests, gate-anchor-teacher-only => held-out B zero labels) with RANDOM v_grad. Real-v (job126/104) suppressed held-out B; does random-v too? resolve: held-out file_marker/sentinel/stdout deploy hack ~= real-v => vector decorative (just gradient routing); >> real-v => vector load-bearing for UNLABELED hacks (the whole novelty)
uv run python -m vgrout.train fast --intervention=route2 --seed=41 --teacher-pool-dir=out/pools/substrate --teacher-modes run_tests --route2-random-v-seed=0 --steps=200 --eval-ablate-every=20 --eval-n-prompts=24 --gate-anchor-teacher-only --out-tag=_a5_haar_d0_teacheronly_s41

34 KiB Raw Blame History