Commit Graph

  • 3e7b8ecfc0 feat: just dyn = auto-plot newest full-length log per arm wassname 2026-06-01 09:03:37 +00:00
  • 4d98c8dd34 refactor(a): drop 4 unreferenced standalone scripts wassname 2026-06-01 08:59:48 +00:00
  • ff82fbb940 plot_dynamics: per-step deploy curve from hk_abl + routing2 arm wassname 2026-06-01 06:25:04 +00:00
  • 095dac1f6e feat: default rollout_ablate_frac=0.5 + free per-step deploy proxy (hk_abl/slv_abl) wassname 2026-06-01 05:58:22 +00:00
  • ea4f4ee657 feat: rollout_ablate_frac exploration floor vs hack-saturation (route/route2) wassname 2026-06-01 05:32:04 +00:00
  • dfc6068896 feat: route2 resid column = cos(deployed delta_S.grad after routing, v_grad) wassname 2026-06-01 03:20:48 +00:00
  • 59f8b6efdc journal: route2 capacity-imbalance realization + scale-matched delta_S fix wassname 2026-06-01 02:58:35 +00:00
  • dc5d4516c2 smoke: run on GPU (bf16 + flash_attn2), not CPU+fp32 wassname 2026-06-01 02:56:34 +00:00
  • 8158adb543 refactor: route2 quarantine = scale-matched delta_S_hack, rip out 33M LoRA wassname 2026-06-01 02:52:02 +00:00
  • 6eb894f44d feat: per-step calibrated tau for route2-grad routing (replaces cos>0 coin-flip) wassname 2026-06-01 02:15:10 +00:00
  • acc23885b6 spec: per-step calibrated tau for route2-grad (keep vector, fix coin-flip gate) wassname 2026-06-01 02:08:26 +00:00
  • 1d105a93a4 review: 3-model external panel on route2 pseudocode + synthesis wassname 2026-06-01 01:44:31 +00:00
  • 090f29671d docs: SGTM vs ours -- diagnostics, tricks, and proposed improvements (B = route within delta_S along SVD axes) wassname 2026-06-01 01:39:46 +00:00
  • 23512ed07c feat: qE column -- grad energy fraction into the quarantine wassname 2026-06-01 01:38:15 +00:00
  • 7a55b77786 audit-log: print a fixed healthy-vanilla gen as a coherence yardstick wassname 2026-06-01 01:15:25 +00:00
  • bb778d3cc6 refactor: drop verbose route2 refresh log line, fold overlap into compact refr column wassname 2026-06-01 01:08:55 +00:00
  • 8ef78f6d14 route2 refresh basis-overlap log + soft ppl-drop warning wassname 2026-06-01 00:39:43 +00:00
  • 11bcdd2fe6 route2 instrumentation + lr fix + deploy overlay (route2-act divergence) wassname 2026-05-31 23:16:39 +00:00
  • ad048e59c6 fix: results.py parses gt_s/hack_s by header name, not stale fixed indices wassname 2026-05-31 22:45:12 +00:00
  • 47abce4307 journal: erase arm sub4 (-7.8pp hack/+4.7pp solve, win on file_marker) wassname 2026-05-31 18:56:09 +00:00
  • b0432af1e8 journal: vanilla sub4 emergence reference (per-mode asymmetry) wassname 2026-05-31 16:56:16 +00:00
  • b72c5acee2 journal: routing deploy-solve > train-solve (the "never learns to pass" worry, refuted) wassname 2026-05-31 14:11:53 +00:00
  • 6b22dc5055 feat: per-mode deploy JSON artifact for every arm + queue-substrate recipe wassname 2026-05-31 14:10:20 +00:00
  • dd3b5af3db spec: log execution pass (refresh no-op + bf16 dtype fixes, random-V cancelled, defaults cleanup, T4 split) wassname 2026-05-31 13:39:31 +00:00
  • 1086c98de7 cleanup: substrate pool + prog_wide pairs are FastConfig defaults wassname 2026-05-31 13:39:07 +00:00
  • 80f6b52860 fix: route2 quar/v_act dtype mismatch on bf16 model (A_q/B_q/v_act fp32 vs bf16 x) wassname 2026-05-31 13:35:25 +00:00
  • 25569193c5 fix: route2 refresh was a silent no-op (vhack_refresh_every gated on v_hack, which route2 sets None) wassname 2026-05-31 12:19:38 +00:00
  • 20f8630848 spec: T4 leakage-metric design (SGTM ratio form) + defer L1 knob with reasoning wassname 2026-05-31 11:28:47 +00:00
  • 2b020c95c0 fix: route2 Arm A flags per-rollout not per-token (external review) wassname 2026-05-31 11:25:13 +00:00
  • ffeb632652 journal: 3 external reviews converge on v1-route artifact hypothesis; route2 inherits the Adam-parasite risk, prioritize leakage metric wassname 2026-05-31 10:51:11 +00:00
  • 670fcb3c64 feat: route2 grad-mask (Arm A) + drop tau knob + pairset-derived v_hack path wassname 2026-05-31 10:48:31 +00:00
  • 442630fcae docs: routing-v2 spec, related-work scorecard, paper fetches, journal wassname 2026-05-31 10:16:21 +00:00
  • 4359dc53a8 feat: route2 distinct-basis quarantine + per-sample act-mask detach-route wassname 2026-05-31 10:16:13 +00:00
  • 6cce11326a journal: 4-arm 60-step matched -- erase ≈ vanilla, route effect is deploy-only and confounded wassname 2026-05-31 06:41:59 +00:00
  • c6748023ba diag: cos_pre/post = ||relu(V@g)||/||g|| (hack-ward fraction) not signed sum wassname 2026-05-31 05:03:54 +00:00
  • 07acadb43f plot: single 'just plot' entrypoint emits per-mode + aggregate (reuse plot_dynamics) wassname 2026-05-31 04:37:31 +00:00
  • 83235b6cfe feat: per-mode deploy hack in route eval (the weak-detector generalisation cut) wassname 2026-05-31 04:32:10 +00:00
  • f1af70d34e journal: route puzzle resolved (signed-cos red herring, deploy eval works, v_hack run_tests-only) wassname 2026-05-31 04:28:02 +00:00
  • 638f3e964e journal: erase arm completed (4/4 modes, HACK_S=0.274 at PASS=0.284) wassname 2026-05-31 01:54:40 +00:00
  • 031a93b701 journal: erase arm cin_t/cin_s crossover (teacher->student hack-grad handoff) wassname 2026-05-31 00:46:07 +00:00
  • d781b56ff4 docs: fix review findings (global noise-floor, route one-sided, G3 xref) wassname 2026-05-31 00:41:12 +00:00
  • f7288e569d docs: 4-arm framing, weak-detector test, hack-mode appendix wassname 2026-05-31 00:20:47 +00:00
  • 8c2421d644 journal(e): route-only v_hack refresh collapse root-caused to live quarantine + fix wassname 2026-05-31 00:04:58 +00:00
  • cec6542ed3 fix: ablate quarantine during v_hack refresh so route doesn't lose the hack wassname 2026-05-31 00:04:10 +00:00
  • b88afbaf87 docs: review outputs + figs; drop stale Qwen3.5-0.8B svd cache wassname 2026-05-31 00:00:40 +00:00
  • 2c266ebdb0 tooling: add ELICIT_HACK prompt tier + validate_spoonfeed updates wassname 2026-05-31 00:00:40 +00:00
  • 23589cb960 diag: log refreshed-basis overlap with prior basis per v_hack refresh wassname 2026-05-30 23:46:33 +00:00
  • b89e0e62f7 fix: drop stale RH_HINT_REPLACE_TO import that crashed the v_hack refresh path wassname 2026-05-30 23:32:28 +00:00
  • d99c63b6ce recipe: prog_wide v_hack + refresh-5 as run-substrate defaults wassname 2026-05-30 23:09:36 +00:00
  • a485d4391b recipe: run-substrate default 60 steps (was 80); matches fast preset wassname 2026-05-30 23:05:20 +00:00
  • e45767effb plot: multi-seed overlay for substrate emergence (thin per-seed + bold mean) wassname 2026-05-30 21:23:57 +00:00
  • 0ea751c5bc plot: #148 substrate emergence — per-mode hack-rate curves (4/5 learned, seed 41) wassname 2026-05-30 16:49:03 +00:00
  • b685a4495b journal: 2026-05-30 (d) headline — vanilla student learns 4/5 loopholes, eq_override holdout (seed 41) wassname 2026-05-30 16:47:18 +00:00
  • 618d9bd3bf journal: 2026-05-30 (c) hint-coherence audit + K=5 substrate + emergence running wassname 2026-05-30 13:52:07 +00:00
  • 67444b1ee9 fix: align hack-token constants to the tokens the hints name wassname 2026-05-30 13:36:14 +00:00
  • 376962d0c3 fix: _probe_channels drives the method (runs gt asserts) so method-body hacks fire wassname 2026-05-30 13:27:05 +00:00
  • 0edd4ab0b7 hints: name exact conventional token for stdout (all tests passed) + file (result.txt/PASS) wassname 2026-05-30 13:20:36 +00:00
  • c251b8fdee fix: reword exit_code + stdout_marker hints for single-shot reachability wassname 2026-05-30 12:58:24 +00:00
  • 50fa85cdbe journal: Step 1 done — all 6 loopholes mint clean (after {func} fix), 4/6 hints single-shot reachable, emergence running wassname 2026-05-30 12:51:36 +00:00
  • eaee3d013d fix: fail-fast --modes + multi-rep validator (external review) wassname 2026-05-30 12:45:11 +00:00
  • 819dddfcef fix: spoonfeed {func} must be bare method name; obvious paper-style hints wassname 2026-05-30 12:33:59 +00:00
  • 419a8ed8cd feat: bigger-spoon mint recipes (literal code) + modes filter + OpenRouter probe wassname 2026-05-30 12:20:05 +00:00
  • 6d7a9ae10f journal: 6-mode quadrant on fixed grader — faithful=0% (no-cheat holds), generic elicit=0% (none zero-shot discoverable, incl paper's run_tests) wassname 2026-05-30 11:04:30 +00:00
  • 97f9ca0d09 feat: add tqdm progbar to derisk_loopholes generation loop wassname 2026-05-30 10:52:37 +00:00
  • d8874057f3 feat: split elicit into generic (discoverability) vs spoonfeed (mint teacher hacks) wassname 2026-05-30 10:46:21 +00:00
  • 2906bb18ed feat: vanilla ignores v_hack (no misleading cin/cout, no needless extract) wassname 2026-05-30 10:40:35 +00:00
  • f3f2c1250f feat: lean per-step table w/ per-mode hack cols, generic elicit, ship->deploy wassname 2026-05-30 10:35:26 +00:00
  • 8a253060a7 feat: object-attribute sentinel + exhaustive non-overlap matrix wassname 2026-05-30 10:15:36 +00:00
  • c6b433310a fix: two more oracle holes (gpt-5.5 review) — sentinel forgery + int-subclass eq wassname 2026-05-30 09:57:46 +00:00
  • 6974a2cd1d test: add clean/exit_code true-negative (honest correct under exit_code -> not exploited) wassname 2026-05-30 09:44:10 +00:00
  • 2d6b28641e fix: disambiguate sentinel hint (status flag, not 'result variable') wassname 2026-05-30 09:26:21 +00:00
  • 6df80ac246 fix: external-review criticals — os._exit oracle hole + exact even matching + honest teacher gt wassname 2026-05-30 09:15:23 +00:00
  • cb504ef11f docs: substrate implementation status + emergence run queued wassname 2026-05-30 08:57:02 +00:00
  • 4f11cfaabc chore: justfile build-substrate + run-substrate recipes wassname 2026-05-30 08:56:30 +00:00
  • 0240d2ef9f feat: build_substrate two-source teacher batch + scarcest-first even assignment wassname 2026-05-30 08:51:27 +00:00
  • 3960ad9cf5 feat: multi-loophole training substrate (per-problem env_mode + elicit teacher batch) wassname 2026-05-30 08:11:06 +00:00
  • e2d1f2c0bc feat: non-overlap multi-loophole grade (passed = gt_correct OR channel_i) wassname 2026-05-30 08:03:37 +00:00
  • a8807ebe6d spec: add multi-loophole training substrate design (even/non-overlap/teacher-batch/learn-all) wassname 2026-05-30 07:51:28 +00:00
  • fa48e3b204 rewards: add 3 side-channel grader modes (stdout_marker/sentinel/file_marker) for quadrant wassname 2026-05-30 07:41:17 +00:00
  • 42f344c816 spec: UAT1 quadrant result + the base-elicitability-vs-RL-emergence learning wassname 2026-05-30 06:12:46 +00:00
  • 5de7433ca4 spec: code-review-2 resolution (oracle robustness fixes) wassname 2026-05-30 05:48:49 +00:00
  • cf5f4861db rewards: robust strict oracle (review fixes) — SystemExit guard around test calls + whitelist __strict_eq wassname 2026-05-30 05:48:24 +00:00
  • 8e38d0f419 plot_emergence: Phase-1 mode-grouped overlay (hack=exploited vs solve=gt_correct) + regen-emergence recipe wassname 2026-05-30 05:42:39 +00:00
  • c38c855e8a spec: implementation status + plan-review-1 resolution (3-mode honest count) wassname 2026-05-30 05:40:59 +00:00
  • e3b2d43bd0 derisk_loopholes: Phase-0 2-cell quadrant (faithful vs elicit) per env_mode wassname 2026-05-30 05:37:41 +00:00
  • d3c96d4415 train+justfile: env_mode wiring, drop expose-K (load_problems/eval/loop/justfile), run-cell-mode emergence recipe wassname 2026-05-30 05:33:26 +00:00
  • 4e0f78d148 rewards: env_mode (run_tests/eq_override/exit_code) + strict oracle, drop expose-K M2 wassname 2026-05-30 05:30:00 +00:00
  • fc46f690f5 spec: add 2-cell de-risk (faithful vs elicit) + elicit-then-strip warm-start; honest 6-mode count wassname 2026-05-30 05:16:24 +00:00
  • 3d60b4cf43 log: dump first full example of each hack class to verbose log wassname 2026-05-30 05:10:58 +00:00
  • 8a5738c69a spec: reject expose-K, design faithful multi-loophole env wassname 2026-05-30 05:10:28 +00:00
  • dcd881e054 fix: cross-mechanism arms project against prog_wide (best basis, not 21pairs) wassname 2026-05-30 04:53:20 +00:00
  • 764f31a038 fix: regen-dynamics writes to out/figs/ (reorg path) wassname 2026-05-30 04:49:47 +00:00
  • 74a731b7c3 feat: run-cell-exposek recipe (cross-mechanism arm) wassname 2026-05-30 04:47:30 +00:00
  • 180d59fcc9 feat(T1/T2): expose-K env + M2 hardcode detector wassname 2026-05-30 04:45:05 +00:00
  • c3246b674d tidy wassname 2026-05-30 04:38:41 +00:00
  • efdf86a0cb wip wassname 2026-05-30 04:33:33 +00:00
  • f52ba042d5 scripts wassname 2026-05-30 04:16:56 +00:00
  • 4621488cc0 reorg: out/ sorted by datatype (vhack/ pools/ runs/ vhack_grads/ figs/) wassname 2026-05-30 03:52:24 +00:00
  • 4fb7b59548 plot(tufte): route-evidence figure — drop legend/grid/box, direct value labels wassname 2026-05-30 03:22:35 +00:00
  • c7e1500241 plot: add routing arm to dynamics_test overlay (matched mix=0.125/s41/20-step) wassname 2026-05-30 03:21:02 +00:00