Commit Graph

16 Commits

Author SHA1 Message Date
wassname ec11bf58b2 docs: update method descriptions for activation routing 2026-06-11 13:22:13 +00:00
wassname b53043cec3 refactor: extract train_config.py + run_artifacts.py from train.py; slim results scripts
Cleanup by a prior agent, verified green here: 'just smoke' (erase arm)
runs end-to-end and all four wired gates pass (verify_rewards 52/52,
verify_eval_gap, verify_partition, verify_science_invariants).

- train.py -318 lines: Config dataclass -> train_config.py, checkpoint/
  deploy-artifact IO -> run_artifacts.py.
- results.py / results_deploy.py / probe_distill.py slimmed.
- drop stale derived csvs under out/figs (a5_generalisation, dyn_*,
  substrate_aggregate, train_vs_deploy_60).
- gitignore /.pi/ panel scratch.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-09 13:34:50 +00:00
wassname 8e6eace56b fix: rename 4 canonical LeetCode function names in authored/clean pairsets
singleNumber->findUnpaired, longestCommonPrefix->sharedPrefix,
removeDuplicates->inplaceDeduplicate, maxProfit->bestSingleTrade.

Same algorithm and test cases; method name changed so pairs no longer share
a canonical LeetCode function name with training data.

Also update results.md Q14 table: add hack_train/solve_train columns,
vanilla row, and prog_wide contamination note (docs/ is gitignored).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-09 09:23:33 +00:00
wassname f1f1c00f41 results: separate paper vs ours column pairs in anchor table
Paper (longer training, >512 tok/gen) and ours (60-step fast) are not
directly comparable -- now shown as separate column pairs in both
main.tex tab:anchors and docs/results.md Q14.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-09 04:54:35 +00:00
wassname 9398567e91 results: base model solve=0.126 hack=0.000 (matches paper ~0.115)
Fills baseline row in Q14 table and main.tex tab:anchors.
Context: job 23 (steps=0, zero-shot eval, seed 43, n=119).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-09 04:51:34 +00:00
wassname 83f3f98328 results: vanilla hack_deploy=0.613, suppression confirmed (15x reduction at best arm)
Q14 table updated: vanilla landed (hack 0.613, solve 0.101 = base rate).
All routeV arms beat vanilla on both hack and solve. Journal entry added.
main.tex tab:anchors vanilla row filled.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-09 04:51:12 +00:00
wassname 0412dc56d1 results.md: fix regenerate ref (just results-deploy -> just results) 2026-06-09 01:51:28 +00:00
wassname 5007c9757a results: just results = eval2 deploy table (time/headline/deploy/arm/pair/seed/train/argv); hard eval2 cutoff; archive eval1 (Q1-Q13 + 352 old logs) 2026-06-09 01:50:42 +00:00
wassname 824b7eb623 results: Q14 complete eval2 deploy table (4 done: per-token/authored/prog_wide/random-V; via just results-deploy). Corrects earlier claim that job8 prog_wide had no eval2 deploy 2026-06-08 23:57:42 +00:00
wassname e26f5fe08c results: add Q14 -- routeV deploy on recency-clean eval2 (job 15 in; vanilla/act_vote/lora/random-V pending) 2026-06-08 22:58:34 +00:00
wassname 311bf2854f results: fill keynote table/figure at n=3 route2 / n=2 vanilla
C1 headline from deploy-eval (knob-off, n=64, T=0.7, 60-step fast, mix=0.125):
  route2 (n=3): hack 0.031+/-0.031, solve 0.615+/-0.010
  vanilla (n=2): hack 0.305+/-0.039, solve 0.516+/-0.032
  => -27pp deploy hack AND +10pp solve. Keynote fig regenerated as a real
  band (3 route2 + 2 vanilla seeds, per-seed thin lines).

- main.tex tab:keynote + fig:keynote filled (vanilla n=2, s41=job 77 pending).
- results.md Q12 (route2 deploy n=3) + Q13 (floor leak = staleness not
  structure: no-floor 0.000, floor+stale 0.125, floor+refresh-1 0.000, job 73).
- RESEARCH_JOURNAL 2026-06-02 entry.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-02 11:08:41 +00:00
wassname f88b8b32c1 results: add Q10 (pairset mechanism>framing>placebo) + Q11 (60-step convergence gap closes)
Q10: swap only pair-set content (all bases k=12/tau=0, trained k=5, seed-41
mix=0.125 frozen). prog_wide (mechanism) -0.226; semantic framings ~0; null_city
placebo +0.024. v_hack tracks the hack mechanism, not a generic honesty
direction. n=1 per row, baseline noise +/-0.06.

Q11: 60-step seed-42 mix=0.125, gap closes (vanilla 0.936, frozen 0.957,
refresh-2 0.907) -- projection delays but does not prevent hacking at this
horizon. n=1, confounded with mix/seed vs Q2.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-30 02:34:22 +00:00
wassname 5d83adbb25 fix: correct the "18 vs 21 pair" basis claim (it was never about pair count)
Read the safetensors shapes/metadata: v_hack_full = 10 pairs / k=5,
v_hack_21pairs = 16 pairs / k=12 (n_heldout=2; neither is 18 or 21). The two
bases differ on pairs AND directions-kept AND extract-tau simultaneously, so
the hack-cut gap is triple-confounded, not a clean "pair set is the lever"
result. Nothing was lost: the strong basis reproduces from current pairs.py
via --top-k=12 --v-hack-drop-bottom-frac=0.0, and refresh already re-extracts
at k=12. Rewrites Q8 + the top confound bullet + the README findings caveat.
A one-knob k-sweep is needed to attribute the gain.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 10:12:12 +00:00
wassname 46f10d8150 results: absolute-rate tables + provenance, lock mix=0.125 default
docs/results.md: lead with absolute last-5 rates (compare within a table by
eye); restrict refresh-cadence/gate/basis comparisons to the seed they actually
share (kills the fake refresh "ladder" that compared n=1 cadences to a 4-seed
frozen mean); add Q6 solve columns, Q8 pair-content axis breakdown (8/18 pairs
are axis-1 weak-tests; the 21-pair set is not in committed pairs.py -> FIXME),
Q9 solve-orth negative result, and a dynamics note (solve never climbs; hack
plateaus ~step 15).

scripts/results.py: add `log` provenance column; drop the wide argv/time cols.

Lock mix_ratio=0.125 as the default (FastConfig group 4->8 so the split is
non-degenerate; drop --mix-ratio=0.5 from fast recipes). Q6 shows 0.125 keeps
the hack cut with no solve tax. Smoke passes.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 09:30:30 +00:00
wassname e7cdcaa0ab results: same-seed paired deltas + std, exclude incomplete runs
- paired view: join projected to vanilla on (mix, seed), per-seed delta, mean
  +/- std over shared seeds. Comparing a 3-seed mean to a 1-seed point is
  meaningless; this enforces same-seed comparison (ml_debug principle).
- grouped view now reports std across seeds (null at n=1).
- exclude in-progress/aborted runs (must log all `steps`) so partial logs
  don't read as impossibly-good results.
- docs/results.md rewritten around paired deltas; honest that at n=4 the
  last-5 Dhack std (~0.15) ~= the mean (~0.13), so the effect is consistent
  in sign but not cleanly separated from zero.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 08:10:42 +00:00
wassname 4464f9d312 results tooling + solve-orth knob + results-by-question doc
- scripts/results.py + `just results`: aggregate logs/*.log into last-5
  hack_s and gt_s (solve) tables, sorted-by-time + grouped-by-config, with
  full argv provenance column. Filters smoke/probe runs.
- extract_vhack_grad: solve_orth_m knob — strip top-m known-solve subspace
  (SVD of clean-side grads) from D before SVD, so projection doesn't ablate
  the solve signal. No grader/oracle, off by default.
- docs/results.md: every experiment grouped by the question it answers
  (feasibility, H1, gate_mode, basis, refresh, mix, noise-floor, pair-set)
  with comparison tables and answers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-05-29 07:21:05 +00:00