Commit Graph

  • 34ad20db0a fix route2 no-cheat leak: teacher-only gate anchor + unit test wassname 2026-06-05 03:53:23 +00:00
  • a9523c9cb8 fix overlay label collisions: common right-gutter anchor + leaders wassname 2026-06-05 03:31:26 +00:00
  • 504922a3d6 fix collision: lift 'deploy hack =0' off the y=0 line in train_vs_deploy wassname 2026-06-05 03:25:49 +00:00
  • b616970e42 fix plot integrity: drop n=28 hack_s fallback in train-vs-deploy series wassname 2026-06-05 03:21:48 +00:00
  • b3539e50e7 no-cheat check: held-out hacked_E is <=1.1% FP, not ==0; correct caption wassname 2026-06-05 03:01:20 +00:00
  • 6869afef72 tab:ablation: fill placebo row (0.000/0.531), correct caption to non-directional wassname 2026-06-05 02:57:06 +00:00
  • 3da296469b plot_deploy_overlay: Cleveland dot plot replaces grouped bars (tufte) wassname 2026-06-05 02:51:13 +00:00
  • 5f478f4bd0 a5: explain generalisation mechanism in caption + ≡0 marker on zero bars wassname 2026-06-05 02:46:57 +00:00
  • aa6bf57a73 plot_deploy_overlay: aggregate seeds per arm, std error bars (n>1) wassname 2026-06-05 02:45:22 +00:00
  • 04430e3ef2 plot_dynamics: hack≡0 tags on overlay, labels on top panel, 2-panel train-vs-deploy wassname 2026-06-05 02:43:50 +00:00
  • 176f1baae6 journal: placebo (job 86) full 60-step verdict -- route2 non-directionality locked (12/12 evals at 0.000) wassname 2026-06-05 02:35:03 +00:00
  • 5257ff010e plot_dynamics: train-vs-deploy 2x2 uses matched n=64 eval on both rows wassname 2026-06-05 02:33:10 +00:00
  • 0645ae2dd2 fig:longrun: rebuild from job84 route2 + job97 fixed vanilla (no collapse) wassname 2026-06-05 02:18:34 +00:00
  • ea0293083c journal: queue erase directionality test (jobs 105/106 real-v vs placebo) wassname 2026-06-05 01:01:08 +00:00
  • beee08b559 journal: WATCH placebo (job 86) suppressing deploy hack like real v_hack -- directionality at risk wassname 2026-06-05 00:36:04 +00:00
  • b311815fc6 writeup: C4 'teacher bootstrap is an accelerant, not the signal' (addresses the no-endogenous-run caveat) wassname 2026-06-04 23:26:54 +00:00
  • b891109633 A5 FINAL: held-out modes suppressed 0.62->0.02 (file_marker), zero held-out labels (job 104) wassname 2026-06-04 23:21:39 +00:00
  • 2fb56e1a91 writeup: prefill A5 generalisation table vanilla column (job 103); reframe as designed held-out test wassname 2026-06-04 22:30:35 +00:00
  • 2b48eab6b8 journal: A5 suppression preliminary (job 104 step ~32) -- held-out hacks emerge on-policy, knob-off deploy holds 0.000 wassname 2026-06-04 14:59:26 +00:00
  • 1265445939 journal: A5 baseline (job 103) finished -- per-mode deploy split, held-out modes all emerge on-policy wassname 2026-06-04 13:11:46 +00:00
  • 4fcce164f7 memory: merge qmd-lexical + s2-keyed-access into tracked .claude/memory wassname 2026-06-04 15:40:48 +08:00
  • 154e33683e paper: HARVE byline cross-verified arXiv==S2 (keyed semantic-search .env) wassname 2026-06-04 15:20:04 +08:00
  • b097d9abfc paper: add verified related work (11 refs) + fix Huang->Deng first author wassname 2026-06-04 15:18:44 +08:00
  • e91e095596 journal: A5 baseline confirmed -- held-out file_marker+sentinel emerge on-policy (run_tests-only teacher), deploy hack 0.52 by step 40 wassname 2026-06-04 05:33:36 +00:00
  • a162eded65 journal: job 97 gentle-probe refutes vanilla-200 collapse (lp_s>-0.5); A5 baseline sees held-out file_marker emerge on-policy wassname 2026-06-04 04:01:18 +00:00
  • 5a25a1cc1c results: fill route-rf2 ablation cell (job99: deploy hack 0.000/solve 0.625, staleness harmless) wassname 2026-06-04 03:00:57 +00:00
  • 46b102ad22 eval: revert eval-every default 10->5 (knob-on removal made it cheap again) wassname 2026-06-04 02:29:23 +00:00
  • 208713d7c2 perf(eval): drop redundant per-step knob-ON pass, default eval-every 5->10 wassname 2026-06-04 02:25:07 +00:00
  • 65a05c365c fix(writeup): flag vanilla-200 collapse as preset artifact (job 97), not a finding wassname 2026-06-04 00:42:44 +00:00
  • 0efd638b4e memory wassname 2026-06-03 23:56:50 +00:00
  • 0913b064fc journal: A5 run plan queued (strict teacher-modes=run_tests, vanilla baseline + route2 test) wassname 2026-06-03 22:46:56 +00:00
  • da48a95d9e feat: --teacher-modes for clean A5 no-cheat (train held-out modes on-policy, anchor only known) wassname 2026-06-03 22:45:49 +00:00
  • a0d4ddf9d5 journal: A5 no-cheat table (hacked_E sees only run_tests; file_marker held-out) + route-run mechanism blocker wassname 2026-06-03 22:40:06 +00:00
  • f8aea5f9e6 journal: #187 resolved -- vanilla-200 collapse was the hot preset, not long-horizon GRPO (job 97) wassname 2026-06-03 20:23:41 +00:00
  • 6085efcc54 paper: de-meta the captions (humanizer/paper-writing) wassname 2026-06-03 11:43:08 +00:00
  • 895aedd983 paper: page-1 headline fig, dir arrows, algorithm pseudocode, polish wassname 2026-06-03 11:38:32 +00:00
  • bd7550f559 paper: framed code blocks, real AntiPaSTO cite, leave-one-out ablation wassname 2026-06-03 11:22:22 +00:00
  • 9c188f60be memory: track agent memory in repo .claude/memory (symlinked from harness path) wassname 2026-06-03 11:08:25 +00:00
  • 51ee43577a paper: tab:ablation interp -- weak vector fails to erase but routes (SGTM absorption) wassname 2026-06-03 11:05:20 +00:00
  • d06a4fbe69 journal: framing -- post-hoc proves v_hack weak, but weak suffices as a routing gate (SGTM absorption) wassname 2026-06-03 11:04:07 +00:00
  • 3cc804b15e results: A3 post-hoc erase rows (weight 0.297 partial, act 0/0 lobotomy) wassname 2026-06-03 10:50:08 +00:00
  • f4ac26a440 results: A3 erase-static row lands (deploy hack 0.500); both erase arms fail wassname 2026-06-03 10:02:51 +00:00
  • 8d16b317cb results: A3 erase-online row lands (deploy hack 0.562 > vanilla; route 0.000) wassname 2026-06-03 06:47:58 +00:00
  • 1fb49a3325 log: reprint step-table header every 50 rows; related-work: Piggyback learned-mask critique wassname 2026-06-03 04:46:12 +00:00
  • b8dcb4ec33 diag: count zero-variance-skipped GRPO groups per step (zerovar=) wassname 2026-06-03 04:20:03 +00:00
  • 753a54c625 paper: keynote A1/A2 to n=3 (route hack -0.292 vs vanilla, paired p~=0.013) wassname 2026-06-03 03:36:32 +00:00
  • 87cca9a603 plots: dejargon, drop redundant titles, emit png+svg+pdf, CSV re-render wassname 2026-06-03 02:44:39 +00:00
  • 17a8792340 paper: address comprehension friction + OpenReview novelty challenge wassname 2026-06-03 02:29:45 +00:00
  • ffc2df540f blog: drop reader-facing route2 tag -> route (consistency with paper) wassname 2026-06-03 02:20:13 +00:00
  • dbcc3a5ad3 paper: show the contrastive pairs in appendix (resolve synthetic-pairs flag) wassname 2026-06-03 02:17:49 +00:00
  • 5dcc90363a paper: humanizer pass on prose I added (em-dash -> commas) wassname 2026-06-03 01:49:01 +00:00
  • 4a002e942f paper: precise Huang trusted-direction contrast; rename paper note deng->huang wassname 2026-06-03 01:47:24 +00:00
  • c1388e5325 paper: title -> question form 'Can We Quarantine Reward Hacking with a Reward-Hacking Representation?' wassname 2026-06-03 01:42:03 +00:00
  • 97a4c5d7b1 paper: reframe lineage SGTM (mechanism) > Cloud (concept); set title wassname 2026-06-03 01:19:35 +00:00
  • 05731cc0e4 paper: drop reader-facing route2 version tag; flag SGTM-not-Cloud lineage wassname 2026-06-03 00:59:24 +00:00
  • 4c9071cca0 A5: build held-out-mode (hack,clean) pairs from student rollouts wassname 2026-06-03 00:59:07 +00:00
  • a7703409ea paper: replace two defensive 'X not Y' framings with positive statements wassname 2026-06-03 00:27:54 +00:00
  • a296f0c950 cleanup: trim 2 stale provenance/train-of-thought comments wassname 2026-06-03 00:25:22 +00:00
  • 4fa9061162 refactor: move 5 leaf entrypoints src/ -> scripts/ (src is now library-only) wassname 2026-06-03 00:23:56 +00:00
  • 07363f1ede cleanup: trim stale comments + attic README wassname 2026-06-03 00:09:19 +00:00
  • fffd26a93d cleanup: delete 6 orphan modules, quarantine pair generators, trim stale comments wassname 2026-06-03 00:09:01 +00:00
  • 025debae6b feat: knob-ON eval (route arms) for like-for-like train-vs-deploy + teacher-off marker wassname 2026-06-03 00:00:24 +00:00
  • 4ee3f03878 justfile: paper-run recipes on record (longrun/noteacher/teacheroff/harvest) wassname 2026-06-02 23:54:50 +00:00
  • 24fa924c8d plot: 2x2 train(knob-on) vs deploy(knob-off) x arm figure wassname 2026-06-02 23:53:08 +00:00
  • 4336d6c577 feat: log problem_id/env_mode/prompt to rollouts + --teacher-off-step curriculum wassname 2026-06-02 23:51:27 +00:00
  • 62e510ff57 feat: mix=0 no-teacher ablation path (pure on-policy, pool kept for v_grad+partition) wassname 2026-06-02 23:26:26 +00:00
  • e00292860f results: commit longrun A4 fig + CSV data source (force-add, out/ is gitignored) wassname 2026-06-02 23:19:29 +00:00
  • bfa2b82aba plot: longrun A4 fig + visible route2 hack≡0 line + committed CSV data source wassname 2026-06-02 23:19:06 +00:00
  • 19544b3f06 journal: route2 holds deploy-hack=0 to 200 steps (job 84, durable not delayed) wassname 2026-06-02 16:33:04 +00:00
  • 311bf2854f results: fill keynote table/figure at n=3 route2 / n=2 vanilla wassname 2026-06-02 11:08:41 +00:00
  • 768590a625 fix: kill StepLogger merge-shadow that crashed every new run wassname 2026-06-02 08:09:05 +00:00
  • 2570dfaa67 Merge branch 'probe/distill-cosine' of https://github.com/wassname/projected_grpo into probe/distill-cosine wassname 2026-06-02 07:21:49 +00:00
  • cf3ecc40f8 write up wassname 2026-06-02 07:20:42 +00:00
  • 923de6dbe6 docs(writeup): NeurIPS-workshop paper skeleton + tectonic compile recipe wassname 2026-06-02 06:59:15 +00:00
  • 17e4f2e2ff feat: eval_ablate_every default 5 (deploy-eval on for every arm) + workshop artifact tracker wassname 2026-06-02 04:41:43 +00:00
  • 227c173f63 feat: test-time (post-hoc) hack-erasure benchmark wassname 2026-06-02 02:20:51 +00:00
  • cfdb196869 misc wassname 2026-06-02 02:06:43 +00:00
  • 19deef4fb9 docs: refresh blog+README for route2/deploy-eval; embed key dynamics plot; drop sparse-only dots wassname 2026-06-02 01:24:29 +00:00
  • 997de37b26 deploy-eval every arm + single-row dynamics plot (apples-to-apples) wassname 2026-06-02 00:56:44 +00:00
  • 633bb021e2 plot(dyn): dot route2's deploy-eval points so sparsity is visible wassname 2026-06-02 00:25:41 +00:00
  • 00e110c498 plot(dyn): make cos row cross-arm-comparable (sep + leak) wassname 2026-06-02 00:20:37 +00:00
  • 8e42836b46 feat(plot): hack-overlay gains a solve-rate subplot (Tufte two-panel) wassname 2026-06-02 00:05:35 +00:00
  • 2defc4a3ea fix(plots): drop deprecated routing arm; plot_substrate reads per-batch counts wassname 2026-06-02 00:02:43 +00:00
  • 83d41933b2 fix(plot): no-floor route2 deploy panel was blank -- hk_abl column present but all-nan wassname 2026-06-01 23:36:26 +00:00
  • a1b17ab9f8 Merge branch 'worktree-refactor-train-style' into probe/distill-cosine wassname 2026-06-01 12:42:32 +00:00
  • 151c072c34 style: em-dash -> ASCII '--' in comments across src (check-1 dir-wide) wassname 2026-06-01 12:20:54 +00:00
  • 5c09feeb14 refactor: decompose train.py helpers into clean's module names wassname 2026-06-01 12:15:12 +00:00
  • 3aa90c99b4 style(train): voice pass 2 -- section banners + finish math sweep through main() wassname 2026-06-01 12:05:58 +00:00
  • 3a39231d3e style(train): clean-repo voice pass 1 -- math notation + kill LLM tells wassname 2026-06-01 11:58:36 +00:00
  • f8a76e3bf5 default rollout_ablate_frac=0 (drop the exploration floor) wassname 2026-06-01 11:49:32 +00:00
  • ed97dc8c41 log: per-mode hk_<mode> cols = current-batch count, not cumulative n/d wassname 2026-06-01 11:02:31 +00:00
  • 7ef999fe53 merge: pseudocode-readable refactor of train.py wassname 2026-06-01 09:48:16 +00:00
  • 5dfc157f81 docs(c): prose signposts on the main() training loop wassname 2026-06-01 09:33:50 +00:00
  • 010259fe62 journal: route2 works at n=1 -- deploy hack 0.31->0.00 at +6pp solve, held-out file_marker suppressed wassname 2026-06-01 09:10:09 +00:00
  • e1df929a13 feat: just dyn = auto-plot newest full-length log per arm wassname 2026-06-01 09:03:37 +00:00
  • afcd4a1e88 refactor(c): extract data.py (load_problems) + vhack.py (load/postprocess v_hack) wassname 2026-06-01 09:29:03 +00:00
  • 2001f41e03 refactor(c): extract eval.py (ref_logprobs, ablate_quarantine, eval_hack_solve) wassname 2026-06-01 09:22:30 +00:00
  • 692f0ac00a refactor(c): extract tablelog.py (StepLogger, setup_logging, _Col) wassname 2026-06-01 09:13:30 +00:00
  • 8503dc1914 journal: route2 works at n=1 -- deploy hack 0.31->0.00 at +6pp solve, held-out file_marker suppressed wassname 2026-06-01 09:10:09 +00:00
  • 0b289f2fd1 docs(a): prose section signposts in main() setup phase wassname 2026-06-01 09:05:42 +00:00