Commit Graph

  • 8f39c4a69f docs: rewrite Evil MoE spec to the soft-routing design + literature evidence main wassname 2026-06-14 13:06:38 +08:00
  • 04a98b321e feat: Evil MoE — learned soft router + pin loss on an ablatable hack expert wassname 2026-06-14 11:25:14 +08:00
  • cca7150ea0 tidy wassname 2026-06-14 11:05:54 +08:00
  • c4ac632b37 docs: add Evil MoE experiment proposal wassname 2026-06-14 09:28:03 +08:00
  • 41d225a5ec writeup wassname 2026-06-12 04:46:01 +00:00
  • af420ec855 feat: generation-matched logπ_old baseline + global-quantile gate + frac=0 method wassname 2026-06-12 03:22:48 +00:00
  • ec11bf58b2 docs: update method descriptions for activation routing wassname 2026-06-11 13:22:13 +00:00
  • 668285c86e journal: fix pueue IDs 28-30 -> 31-33 in entry (f) provenance wassname 2026-06-11 13:08:20 +00:00
  • 8000aa48f4 journal(#41): entry (g) routeA shipped + guard-drop calibration; track moduleS diag scripts wassname 2026-06-11 12:50:07 +00:00
  • f646e57028 fix(#41): routeA gate hardening from fresh-eyes review wassname 2026-06-11 12:48:11 +00:00
  • adca442253 feat(#41): routeA activation gate replaces routeV grad gate wassname 2026-06-11 12:38:19 +00:00
  • 5a340e5c3e feat(routeA): forward-only v_act extraction + verify gate vs cached diag features wassname 2026-06-11 12:13:32 +00:00
  • d51028a618 user wassname 2026-06-11 12:13:13 +00:00
  • 77fa5bbf6b spec: routeA plan approved; deletion scope extended to extract_vhack_grad + all grad-gate helpers wassname 2026-06-11 11:50:20 +00:00
  • 8b6bf5b6a6 journal(e): super-S-space and act t-stat both null; act_dot mean-extraction stays the routeA default wassname 2026-06-11 11:44:12 +00:00
  • 1d4f33ffb6 diag: super-S-space gate score null; spec -> act_dot + winsorized-Otsu plan wassname 2026-06-11 11:42:44 +00:00
  • 1b51c62cdc journal(d): z-norm+otsu pinning fix, module weighting null, mode hypothesis falsified wassname 2026-06-11 11:28:37 +00:00
  • e5b68acf69 diag(#40): z-norm scores within family, winsorized 2-threshold Otsu zones, --feats offline mode wassname 2026-06-11 11:23:42 +00:00
  • 270c4f5a27 misc wassname 2026-06-11 11:07:28 +00:00
  • 7871aa66b8 tidy wassname 2026-06-11 11:07:17 +00:00
  • 24aea19bee diag(#40): offline follow-up -- pooling variants, synthetic common-mode, env_mode join wassname 2026-06-11 10:58:16 +00:00
  • bbf6a97151 diag(#40): fix 3x2 plot -- keep bottom spine at axis edge (tick labels collided with rugs), skip labels on narrow zones wassname 2026-06-11 10:47:57 +00:00
  • 4be1858b14 journal(c) addendum: resid_cos 0.92/0.70/0.80 vs act_cos -- bottleneck not the limiter, routeA defaults to resid wassname 2026-06-11 10:46:06 +00:00
  • 0660e7bdd3 diag(#40): residual-stream rep (layers 12/18/24) as third gate-score candidate, 3x2 panel wassname 2026-06-11 10:45:50 +00:00
  • 4a7465c0da journal(c): act score replicates (0.75-0.87), grad decays to chance across windows; routeA next wassname 2026-06-11 10:44:40 +00:00
  • 49ca51b6e1 diag(#40): review fixes -- A>0 contrast headline (adv-only baseline 0.90 vs-all / 0.58 A>0), headline=behavior_ training default, n at P@rout, rug strips, drop density ticks wassname 2026-06-11 10:22:08 +00:00
  • 4644af155a docs: vocabulary section -- routing, vector, pinning, absorption wassname 2026-06-11 10:13:06 +00:00
  • 2f5f93f4fb diag(#40): Q2 rewrite -- {grad,act}x{cos,dot} scores vs exploited&A>0 label, 3-zone pinning, pairset table wassname 2026-06-11 10:13:06 +00:00
  • c33b810faa data: wave-2 behavior2_* pairs -- 15 new proxy-gaming mechanisms, 2 contaminated drafts replaced wassname 2026-06-11 10:04:23 +00:00
  • 70697ff36e diag(#40): pinning plot splits solve/fail/hack + per-pairset AUROC ranking wassname 2026-06-11 06:16:27 +00:00
  • 19687087b0 feat(#30,#39): simple online gate -- band from current batch, no window/cloud; lr 1e-4 wassname 2026-06-11 06:04:28 +00:00
  • 979daf84fd feat(#30): mean+k*std online gate replaces fixed quantile; always-show route cols wassname 2026-06-11 02:56:07 +00:00
  • 4f60f94072 spec: small-reward-hacking env spinout (parked post-paper; commit archaeology for the 6->4 mode selection) wassname 2026-06-11 02:07:55 +00:00
  • 154a37441b refactor: OneCycleLR replaces SequentialLR(LinearLR, CosineAnnealingLR) wassname 2026-06-11 01:52:30 +00:00
  • a72835315c fix: lr 3e-4 + 20% warmup (5e-4 diverged at warmup-end); slim bf16 ckpts wassname 2026-06-11 01:42:20 +00:00
  • f1dd9fb33e chore: FastConfig save_ckpt_every 10->20 (disk pressure; ~6 ckpts/run suffices) wassname 2026-06-11 00:35:29 +00:00
  • 97aede8d9c fix(routeV): keep=bulk gate + deterministic teachers + deploy-mode generation wassname 2026-06-11 00:29:12 +00:00
  • 3f2b44452a feat: online-stats gate + step-level teacher forcing + AUROC diagnostic wassname 2026-06-10 14:22:37 +00:00
  • 05a00aa487 feat(T4): symmetric solve-teacher pool + routed-share discrimination diagnostic wassname 2026-06-10 12:02:39 +00:00
  • bf616749ee Consolidate tagged hack pairsets in data wassname 2026-06-10 11:58:53 +00:00
  • 944ada360b cleanup(lora2r): resolve user TODOs -- F.linear alias + jaxtyping hook shapes wassname 2026-06-10 11:51:49 +00:00
  • 35286040ed run: decision arms explicit at --unhackable-frac=0.5 (25%->50%) wassname 2026-06-10 11:50:12 +00:00
  • 7e11c024c4 cleanup: delete dead delta_S machinery (PiSSA->lora2r leftovers) wassname 2026-06-10 11:45:54 +00:00
  • 4120d75ea4 feat: top-k routing subspace for routeV (--v-grad-k, gate=max_i cos) wassname 2026-06-10 11:44:45 +00:00
  • 9fd2b6b89b test: add mixed-batch per-rollout routing gate to verify_lora2r_routing (T8) wassname 2026-06-10 11:24:49 +00:00
  • 4ee5c27f7b docs: rewrite README for lora2r/three-arms (was SVD-delta_S/erase) wassname 2026-06-10 11:23:21 +00:00
  • 103d0acc2c cleanup: delete antipasto.py; attic 7 erase-era scripts (T1/T6) wassname 2026-06-10 11:21:53 +00:00
  • 62ebf719d0 justfile: prune to lora2r-only (645->~180 lines) wassname 2026-06-10 11:20:48 +00:00
  • 5714996c56 docs+justfile: pairs concept note (AGENTS.md) + lora2r smoke/decision recipes wassname 2026-06-10 11:08:59 +00:00
  • 5c97975185 refactor: collapse to lora2r-only (none/routeV/absorb); delete erase/antipasto/lora_frozen_b paths wassname 2026-06-10 10:58:22 +00:00
  • 6094568c56 feat: lora2r adapter (rank-2r PiSSA-init LoRA) + SGTM three-way hard routing wassname 2026-06-10 09:25:58 +00:00
  • 7511ba12e8 docs: record adapter shapes + shrinkage-vs-separation; journal rotation fix wassname 2026-06-10 06:50:42 +00:00
  • 0112f4a36d fix: rotate the unhackable (gt_only) subset per step, not frozen per pid wassname 2026-06-10 06:14:08 +00:00
  • f3df50f631 tool: migrate v1 deploy_test/eval_curve -> v2 field names (for mid-flight runs) wassname 2026-06-10 05:27:38 +00:00
  • c3af6cc03c rename: deployed/as_trained policy views, kill 'knob' (schema paired_final_v2) wassname 2026-06-10 05:26:51 +00:00
  • 51c5a757ef docs: make active-path comments concise wassname 2026-06-10 05:19:52 +00:00
  • c031d9db76 log: print one resolved-config block at startup (pairset front and center) wassname 2026-06-10 05:12:58 +00:00
  • c9ff99d87a feat: single fail-fast config-validation block; consolidate scattered checks wassname 2026-06-10 05:05:14 +00:00
  • 5c2edb9593 Merge branch 'probe/distill-cosine' of https://github.com/wassname/projected_grpo into probe/distill-cosine wassname 2026-06-10 05:02:17 +00:00
  • 5ae9187639 fix(tablelog): stale arm gates hid qmass + per-token cols in streaming view wassname 2026-06-10 05:00:12 +00:00
  • 1f7a1f3333 pairs: default to hand-authored pairs_authored.json; drop contaminated prog_wider/widest wassname 2026-06-10 04:43:57 +00:00
  • a94c506dbd feat: routeV_top_k -- route by oriented top-k SVD subspace (max-cos gate) wassname 2026-06-10 04:42:57 +00:00
  • 7f45189f1c config: eval2/eval3 regime + fold per-token into arm + FastLora; drop contaminated prog_wide; OOD pairs wassname 2026-06-10 04:21:54 +00:00
  • b36e3db255 docs: tone down the START HERE links to plain pointers wassname 2026-06-10 03:36:52 +00:00
  • 0d6ff754ec docs: AGENTS.md START HERE links (human_journal, main.tex, grad-routing paper); revert rescore fallback wassname 2026-06-10 03:34:06 +00:00
  • b334b5f516 fix: rescore_deploy tolerates old-schema checkpoints (default eval-harness params) wassname 2026-06-10 03:31:07 +00:00
  • e7ed74956f fix: gt_only-only run (no-loophole ceiling) no longer divides by zero wassname 2026-06-10 03:19:22 +00:00
  • 2a9a3bc8a1 fix: rescore_deploy derives eval_modes from by_mode keys (old jsons lack eval_modes) wassname 2026-06-09 22:56:13 +00:00
  • 61d3819dae docs: README/figs name the current arm routeV, not the dropped route2 wassname 2026-06-09 13:39:15 +00:00
  • b53043cec3 refactor: extract train_config.py + run_artifacts.py from train.py; slim results scripts wassname 2026-06-09 13:34:50 +00:00
  • 3f82041d90 plot: deploy Pareto draws knob-on->off before/after on the n=119 axis wassname 2026-06-09 13:15:19 +00:00
  • 3c27d922d2 docs: record science correctness audit wassname 2026-06-09 13:10:17 +00:00
  • d68c17e7c5 eval: final deploy eval records knob-on (deployed-as-trained) for quarantine arms wassname 2026-06-09 13:09:50 +00:00
  • 5b0a6ddd91 plot: deploy Pareto (dots, ideal star, more arms) + honest val knob before/after wassname 2026-06-09 12:58:32 +00:00
  • 438068c431 cleanup: consolidate stale loaders and pair scripts wassname 2026-06-09 12:47:32 +00:00
  • ca8d1adf62 plot: replace abs arrow-bars with a single hack-vs-solve Pareto scatter (Tufte) wassname 2026-06-09 12:45:42 +00:00
  • 31c2b9c82f env: unhackable_frac -- flip a random fraction of TRAIN problems to gt_only wassname 2026-06-09 12:39:50 +00:00
  • d4998a71ba docs: merge Ariahw Fig-5 table into the paper md (delete standalone); add abs-scale arrow plot wassname 2026-06-09 12:35:14 +00:00
  • 0973f9ba7c plot: floor_ceiling shows our arms only (vanilla floor + routeV), drop Ariahw bars wassname 2026-06-09 12:26:55 +00:00
  • bcfcee0d06 fix floor_ceiling asymmetry: paper methods on BOTH panels wassname 2026-06-09 12:10:55 +00:00
  • 0538dbf2f1 add routeV_absorb_all: 100% absorption, no vector (H2 extreme control) wassname 2026-06-09 11:56:12 +00:00
  • 028b8fff68 transcribe Ariahw Fig 5 to a saved table; plot real no-oracle peer (LLM judge) wassname 2026-06-09 11:47:39 +00:00
  • 3b38a05738 no-cheat framing: label-leakage not detector-presence; fix plot comment wassname 2026-06-09 11:22:29 +00:00
  • 6b44dd39bd viz: regenerate floor_ceiling figure with Ariahw reference wassname 2026-06-09 10:03:05 +00:00
  • d393e119e0 viz: reference = Ariahw paper (oracle upper bound), not SGTM wassname 2026-06-09 10:03:05 +00:00
  • 93c862bb2e results-deploy: add select (Youden J) + floor->ceiling columns wassname 2026-06-09 09:56:55 +00:00
  • 34a2eec704 viz: floor->ceiling as two normalized panels (best vs control vs reference) wassname 2026-06-09 09:55:03 +00:00
  • 7d08ad2acd viz: floor-to-ceiling method comparison (csv + figure) wassname 2026-06-09 09:45:37 +00:00
  • e82aa2bf12 Merge branch 'probe/distill-cosine' of https://github.com/wassname/projected_grpo into probe/distill-cosine wassname 2026-06-09 17:28:17 +08:00
  • c390007eb9 human journal wassname 2026-06-09 17:28:15 +08:00
  • 8e6eace56b fix: rename 4 canonical LeetCode function names in authored/clean pairsets wassname 2026-06-09 09:23:33 +00:00
  • dae52b2a7d cleanup: consolidate pairs modules into build scripts + add solve_train to table wassname 2026-06-09 09:17:42 +00:00
  • 00600d13a6 data: add prog_wide_clean.json -- 8/30 pairs that don't encode run_tests grading flaw wassname 2026-06-09 08:58:15 +00:00
  • fb9f68530c refactor: move pair data out of pairs.py into build script; drop tainted axis-1/3 wassname 2026-06-09 08:44:33 +00:00
  • 6f49d5f9b0 refactor: named pairset JSONs + explicit --vhack-pairs-path, remove None fallback wassname 2026-06-09 08:09:09 +00:00
  • a1ef566bac main.tex: document setup differences vs paper in tab:anchors caption wassname 2026-06-09 04:55:58 +00:00
  • f1f1c00f41 results: separate paper vs ours column pairs in anchor table wassname 2026-06-09 04:54:35 +00:00
  • 9398567e91 results: base model solve=0.126 hack=0.000 (matches paper ~0.115) wassname 2026-06-09 04:51:34 +00:00
  • 83f3f98328 results: vanilla hack_deploy=0.613, suppression confirmed (15x reduction at best arm) wassname 2026-06-09 04:51:12 +00:00
  • a35e7b2735 feat: gt_only env-mode + queue baseline/no-loophole ceiling wassname 2026-06-09 03:23:49 +00:00