mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 18:43:00 +08:00

Files

T

wassname 4621488cc0 reorg: out/ sorted by datatype (vhack/ pools/ runs/ vhack_grads/ figs/)

Code writes+reads the new scheme; migrate_out_dirs.py moved 225 loose artifacts
(0 left at top level). Per-run checkpoints+rollouts now group under
runs/<ts>_<run_id>/ as train.safetensors/rollouts.jsonl. Figures land in
out/figs/ with a stable docs/figs/<name>.png symlink (figs.link_latest).
justfile also gains run-cell REFRESH param (online-erasure arm). Smoke +
smoke-vanilla + results all green on new paths. Requeue manifest preserves the
why/resolve labels that pueue reset wiped.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-30 03:52:24 +00:00

4.3 KiB

Raw Blame History

Requeue manifest (queue reset 2026-05-30)

Durable copy of the pueue why/resolve labels + commands, captured before pueue reset wiped them. This file is the requeue plan. Raw JSON snapshot: /tmp/claude-1000/queue_snapshot_033450.json (not durable — this .md is).

Dependency note (env reframe 2026-05-30)

The 4 dynamics-plot arms (none/static-erase/online-erase/route) are to be rebuilt on the NEW expose-K split env, not the original run_tests env. That env does not exist yet: it is gated on T0 (#183, M2>0 reachability) and then needs T1/T2/T4 built (Stage 2 #132). So the plot arms below are commands for the ORIGINAL env and must be re-pathed to the split env once it is built. Original-env why-jobs (marathon #152, prog_widest #181, G2 screens, defer) can requeue immediately and use new out/ paths after the reorg (#131).

#184 (prio 90)

why: T8 KEY GOAL seed-41 cell intervention=none (60-step head-to-head); resolve: route/erase ship-hack < none at matched solve => projection beats vanilla; feeds dynamics plot
```
just run-cell none 41
```

#185 (prio 90)

why: T8 KEY GOAL seed-41 cell intervention=erase (60-step head-to-head); resolve: route/erase ship-hack < none at matched solve => projection beats vanilla; feeds dynamics plot
```
just run-cell erase 41
```

#186 (prio 90)

why: T8 KEY GOAL seed-41 cell intervention=route (60-step head-to-head); resolve: route/erase ship-hack < none at matched solve => projection beats vanilla; feeds dynamics plot
```
just run-cell route 41
```

#187 (prio 90)

why: T8 overlay missing the ONLINE-erasure arm (refresh-2) at matched mix=0.125/s41/60-step — user wants it back in dynamics overlay; resolve: 4-arm overlay none/static-erase/online-erase/route, all seed-41, shows whether refresh keeps hack_s down longer than static
```
just run-cell erase 41 2
```

#181 (prio 40)

why: does v_hack from 'prog_widest' suppress mechanical LeetCode hack at matched solve, seed41 frozen; resolve: L5_hack vs vanilla #153 (0.664), prog_wide #156 (0.500)

uv run python -m projected_grpo.train fast --teacher-pool-dir=out/probe_distill/teacher_pool --grad-clip=500 --seed=41 --intervention=erase --v-hack-path=out/v_hack_pairset_prog_widest.safetensors --out-tag=_pairset_prog_widest_s41

#183 (prio 8)

why: T0 de-risk — does Qwen3-4B hardcode K visible tests under expose-K nudge; resolve: M2 rate >0 => expose-K env reachable, proceed T1; ~0 => STOP rethink env
```
uv run python scripts/derisk_expose_k.py --model=Qwen/Qwen3-4B --n-problems=24 --group=8 --k-visible=2 --seed=41
```

#152 (prio 1)

why: low mix + high refresh, LONG horizon — different equilibrium than full-hack? (v_hack_21pairs, refresh-2 k=12, mix0125, s42, 1000 steps); resolve: hack_s<~0.7 and/or solve_s>~0.2 at large step => new attractor [demoted to bg prio so short jobs run first]
```
just fast-projected --v-hack-path=out/v_hack_21pairs.safetensors --vhack-refresh-every=2 --seed=42 --steps=1000 --out-tag=_equilib_refresh2_k12_mix0125_1000_s42
```

#137 (prio 0)

why: G2 screen rl-rewardhacking-leetcode-gt-monitor-screening-s65 (5-prompt batch); resolve: per-checkpoint E/C/D% table to find non-rh-s65 detector clusters
```
just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-gt-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-gt-monitor-screening-s65 5
```

#138 (prio 0)

why: G2 screen rl-rewardhacking-leetcode-judge-monitor-screening-s65 (5-prompt batch); resolve: per-checkpoint E/C/D% table to find non-rh-s65 detector clusters
```
just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-judge-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-judge-monitor-screening-s65 5
```

#139 (prio 0)

why: G2 screen rl-rewardhacking-leetcode-probe-monitor-screening-s65 (5-prompt batch); resolve: per-checkpoint E/C/D% table to find non-rh-s65 detector clusters
```
just pregen-teacher-alt ariahw/rl-rewardhacking-leetcode-probe-monitor-screening-s65 teacher_pool_rl-rewardhacking-leetcode-probe-monitor-screening-s65 5
```

#173 (prio -10)

why: does Qwen3.6-27B defer under DEFER_PERSONA on blatant authority prompts (w2schar-mini); resolve: [DEFER] gens COMPLY not refuse => wire persona-gen into prepare_round; else need another deferring-anchor source
```
uv run python scripts/validate_defer_persona.py
```

4.3 KiB Raw Blame History