Files
evil_MoE/out/vhack
wassname 7f45189f1c config: eval2/eval3 regime + fold per-token into arm + FastLora; drop contaminated prog_wide; OOD pairs
Config (make the design axes explicit Literal choices):
- eval: Literal[eval2,eval3] (default eval3 = 10% unhackable, deployment-like);
  unhackable_frac is now a derived property; eval/unhackable_frac/pairs recorded
  in deploy_test.json metadata.
- intervention gains routeV_per_token (folds the per-token bool into the arm choice).
- routeV_gate documented as the pinning axis.
- FastConfig grad_clip 500->10 (was never load-bearing); FastLoraConfig subcommand
  (fast-lora) at lr=1e-4 -- the hot 3e-3 diverged lora_frozen_b (job 25, ppl 6e5 gn98 step4).

Pairs:
- delete prog_wide.json (14/30 print-without-assert contaminated; history in git);
  default -> prog_wide_clean.
- rename run_tests->execute_tests in prog_wide_clean + pairs_authored so the
  extraction pairs are OOD (never use the env's real grader fn name). Re-extracted
  v_hack_smoke to match.

justfile: --routeV-per-token -> intervention=routeV_per_token; drop --unhackable-frac
(eval3 default); lora recipes -> fast-lora subcommand; prog_wide -> prog_wide_clean.

smoke green (erase + routeV_per_token); all 4 verify gates pass.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-10 04:21:54 +00:00
..