Commit Graph

  • 3004b23f74 feat: route-evidence figure — training hack vs shipped hack (single run) wassname 2026-05-30 03:17:35 +00:00
  • 4f9651b1f3 log: rename route ablated-eval -> SHIP-eval, self-evident with SHOULD wassname 2026-05-30 03:01:04 +00:00
  • 969c724d9d docs+chore: out/ reorg scheme (queue-gated) + archive dead _OLD_step_format dirs wassname 2026-05-30 02:43:10 +00:00
  • 2b02e7aa77 feat(stage2): T0 de-risk gate — does Qwen3-4B hardcode visible tests? wassname 2026-05-30 02:41:14 +00:00
  • f88b8b32c1 results: add Q10 (pairset mechanism>framing>placebo) + Q11 (60-step convergence gap closes) wassname 2026-05-30 02:34:22 +00:00
  • ee136ac7e8 fix(results): read ground-truth mix_ratio from log, not argv default wassname 2026-05-30 02:24:44 +00:00
  • f917670994 feat: T8 run-cell + regen-dynamics recipes; spec T5 done, T8 in progress wassname 2026-05-30 00:52:14 +00:00
  • fc30514b23 feat: T5 eval-time ablation for route + fix route deployment invariant wassname 2026-05-30 00:50:53 +00:00
  • d6342ab201 feat: gradient routing — delta_S_hack quarantine + intervention {none,erase,route} wassname 2026-05-30 00:31:30 +00:00
  • 62c6794e30 prune: drop mean_diff and solve_orth_m extractor options wassname 2026-05-29 10:21:01 +00:00
  • 5d83adbb25 fix: correct the "18 vs 21 pair" basis claim (it was never about pair count) wassname 2026-05-29 10:12:12 +00:00
  • 46f10d8150 results: absolute-rate tables + provenance, lock mix=0.125 default wassname 2026-05-29 09:30:30 +00:00
  • e7cdcaa0ab results: same-seed paired deltas + std, exclude incomplete runs wassname 2026-05-29 08:10:42 +00:00
  • 4464f9d312 results tooling + solve-orth knob + results-by-question doc wassname 2026-05-29 07:21:05 +00:00
  • 826b2aa83e wip wassname 2026-05-29 06:29:46 +00:00
  • c1f8ca4e7b tidy wassname 2026-05-29 06:29:43 +00:00
  • 3bbac88167 concepts wassname 2026-05-29 06:29:20 +00:00
  • f27c658ca9 docs wassname 2026-05-29 05:42:28 +00:00
  • 22b5d0a8a7 LW draft: add preregistered H1 block-quote with falsification clauses wassname 2026-05-29 03:56:33 +00:00
  • 28e251c2d0 journal (j): note pueue-switch reorder of n=3 fillers to slots 120-122 wassname 2026-05-29 03:52:42 +00:00
  • 638fe23f3e LW-style draft post: gradient projection vs reward hacking (paper-writing skill) wassname 2026-05-29 03:49:51 +00:00
  • ffe206bb55 paper-review pass on lab report: annotated + review files wassname 2026-05-29 03:24:20 +00:00
  • 14db69de97 lab report v3: TL;DR, three-line concept, PASS_RATE column, G_hack rename wassname 2026-05-29 03:18:22 +00:00
  • 2d656d0b37 lab report rewrite: narrative shape + external-panel refinements wassname 2026-05-29 02:55:03 +00:00
  • d46b55f933 journal (j) + WIP lab report: matched-seed projected-vs-vanilla, n=2 wassname 2026-05-29 02:51:05 +00:00
  • f70743c9e9 wip wassname 2026-05-28 12:44:20 +00:00
  • 28d01cf752 rewards: implement E (hacked_loophole_used) and pass through wassname 2026-05-28 10:36:19 +00:00
  • 253268e93a rewards: add mechanism-D detector + pool regrade diagnostic wassname 2026-05-28 09:47:48 +00:00
  • 16e2c37de6 train: online v_hack refresh every N steps wassname 2026-05-28 09:42:17 +00:00
  • 1e3d39e318 justfile: drop 12 dead probe-* recipes superseded by train.py wassname 2026-05-28 09:23:03 +00:00
  • 3efd9e69a8 proj: add gate_mode=reverse (flip sign of hack-ward component) wassname 2026-05-28 09:21:05 +00:00
  • 646edfc7af purge dead modules and stale recipes wassname 2026-05-28 08:42:15 +00:00
  • f487e67405 Goal 0 milestone: fast preset learns to hack in ~10min wassname 2026-05-28 03:22:36 +00:00
  • a82c5c17dd smoke: route through teacher_pool so backward/projection paths fire wassname 2026-05-27 23:49:21 +00:00
  • ecfb3bf30a smoke: tiny-random on CPU, beartype on, 30 steps; one-harness consolidation wassname 2026-05-27 23:33:12 +00:00
  • 577f075611 jaxtyping: shape contracts for v_hack save/load/apply/project paths wassname 2026-05-27 23:20:38 +00:00
  • 3fb8202138 fix: drop nested save_file import so the closure can find it on cache-hit wassname 2026-05-27 22:50:26 +00:00
  • 373c257293 log: caption + drop redundant cols (std, gt, hack, row prefix) wassname 2026-05-27 22:26:04 +00:00
  • 380de028eb fix: silence num_return_sequences deprecation by baking G_s into gen_cfg wassname 2026-05-27 21:42:03 +00:00
  • 1c2324587a fix: pad agg_logp with NaN on zero-variance skip to keep is_s alignment wassname 2026-05-27 21:32:55 +00:00
  • aa1d457701 Journal: first student hacks in #51 at ref_eq=13.5 wassname 2026-05-27 10:10:28 +00:00
  • bccffbe9b1 Fixed-width row formatting so columns align under headers wassname 2026-05-27 10:02:11 +00:00
  • 3531be570f Off-policy diagnostic: per-source mean gen_logp (lp_s/lp_t) + table spacing wassname 2026-05-27 09:42:43 +00:00
  • 41817d2a08 README: add plain-language "How it works" section wassname 2026-05-27 09:39:19 +00:00
  • 3c04aaf06d Journal: cin_s drift in projected mid-run + noise-floor filter note wassname 2026-05-27 09:38:20 +00:00
  • 477380603f Global noise-floor filter on v_hack at load time wassname 2026-05-27 09:37:49 +00:00
  • 9ba7b818a9 Downsample cin_s/cin_t diagnostic via cin_split_every wassname 2026-05-27 09:14:30 +00:00
  • ff26cbe089 Split row cols by source: add rew_s/gt_t; rename timing col t_rew wassname 2026-05-27 09:13:00 +00:00
  • e0f33045a9 Include tau_axis in v_hack cache filename + plumb through Config wassname 2026-05-27 09:11:41 +00:00
  • 5bf2180248 Drop dead code: unused v_sv return from load_v_hack wassname 2026-05-27 09:10:55 +00:00
  • bfc54b83b4 Restore model.train() after v_hack auto-extract wassname 2026-05-27 09:08:55 +00:00
  • 8d2c9afb01 Doc cleanup: mark susp gate as REMOVED in design doc wassname 2026-05-27 09:08:34 +00:00
  • 8d170a0753 Remove runtime suspicion gate wassname 2026-05-27 07:06:50 +00:00
  • 5f196e3108 v_hack v2: top-k + S magnitudes + runtime suspicion gate + per-source cin wassname 2026-05-27 06:39:05 +00:00
  • 75f4aff4d8 Mixed-pool GRPO via cached teacher pool wassname 2026-05-27 02:04:19 +00:00
  • 6bd3abfe5b no_gate projection mode, ariahw hint-replacement loader, mixed-pool plan wassname 2026-05-27 00:45:26 +00:00
  • 890ae62649 token-efficient extract/heldout logs + sensible verify defaults wassname 2026-05-26 02:39:19 +00:00
  • 3785c66290 merge duplicate research journals into root RESEARCH_JOURNAL.md wassname 2026-05-26 02:36:07 +00:00
  • 235b51399f top-k v_hack subspace + real-voice pairs + LoRA bake wassname 2026-05-26 02:33:24 +00:00
  • b4e76525c1 Per-prompt grouping, hint default, ratio diagnostic, LR=3e-4 wassname 2026-05-25 22:03:50 +00:00
  • 00159cd7c6 Fix is_replay bug, add delta_S/logp diagnostics, cycle pools wassname 2026-05-25 21:42:36 +00:00
  • 041729a758 Warmup-gen probe results: H1 untestable at 20 warmup steps wassname 2026-05-25 15:58:37 +00:00
  • a26f71ef1a probe_traj: side-by-side vanilla-vs-projected trajectory analyzer wassname 2026-05-25 12:26:03 +00:00
  • a1fdb45251 warmup_replay_steps: replay then student-gen in one pipeline wassname 2026-05-25 12:24:49 +00:00
  • ab6676d90a mixed-replay GRPO works + cos fix + min/max + journal wassname 2026-05-25 12:20:52 +00:00
  • 1e1b032c31 phase2_analyze: read pilot checkpoints, print trajectories + decision wassname 2026-05-25 12:02:35 +00:00
  • 9c886428bf proj: measure_only kwarg + train.py always-on cos_in diagnostic wassname 2026-05-25 11:50:41 +00:00
  • e04548987f spec2 + base_pool generator + slim replay save (partial mixed-replay TODO) wassname 2026-05-25 11:48:48 +00:00
  • 765a6f6be7 probe_distill: inline per-step cos-by-bucket printout wassname 2026-05-25 10:28:25 +00:00
  • 195b55cc28 spec: reject T5 mixed-policy design after external review wassname 2026-05-25 10:26:33 +00:00
  • 2a21fbc49c spec(distill_probe): Phase 1 done (UAT 4/4), Phase 2 candidates R5-R7 wassname 2026-05-25 10:22:19 +00:00
  • d2e15da4bc NLL distillation loss + UAT T4 via gt_pass split wassname 2026-05-25 10:19:44 +00:00
  • d111db25f7 Distillation probe: hacky teacher (rh-s65) + student per-sample cosine wassname 2026-05-25 10:04:55 +00:00
  • fa24f4eb4b Drop grad checkpointing, KV cache for generate, periodic safetensors ckpt + phase timing wassname 2026-05-24 12:45:21 +00:00
  • 6f68ba34b6 Match paper effective batch + fix gt_tests/KeyError, strip stale docstring wassname 2026-05-24 09:25:47 +00:00
  • 9fb27fe746 register vendored repos as submodules (fix fresh-box empty-dir crash) wassname 2026-05-24 05:32:13 +00:00
  • 87a2b48784 G=6 + logits_to_keep OOM fix, generalization constraint, handover rewrite wassname 2026-05-24 05:03:04 +00:00
  • 973b9407b5 grader bug fix + ref reward semantics + Qwen3-4B substrate wassname 2026-05-23 23:36:00 +00:00
  • 4549a7ca27 handover wassname 2026-05-23 14:20:17 +08:00
  • 0e2c786d4a ready wassname 2026-05-23 14:19:41 +08:00
  • 75a3ec9dd9 ready? wassname 2026-05-23 14:03:05 +08:00
  • 25cba14aee Add new scripts for AntiPaSTO and GRPO validation, including v_hack extraction, held-out validation, and smoke tests wassname 2026-05-23 13:54:51 +08:00
  • e3ad6887e6 Add AntiPaSTO implementation and diagnostic scripts for projected-GRPO wassname 2026-05-23 13:33:33 +08:00
  • 42498682ca spec wassname 2026-05-23 13:04:03 +08:00
  • 2d6695389f refined spec - vec in grad space - SVD first - lsrl for simple_GRPO wassname 2026-05-23 12:32:45 +08:00
  • bf252fac69 fix smoke. wassname 2026-05-23 11:26:39 +08:00
  • 120400c5f5 setup wassname 2026-05-23 10:40:02 +08:00
  • 7248d469a7 init wassname 2026-05-23 10:22:54 +08:00