evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 18:43:00 +08:00

Author	SHA1	Message	Date
wassname	311bf2854f	results: fill keynote table/figure at n=3 route2 / n=2 vanilla C1 headline from deploy-eval (knob-off, n=64, T=0.7, 60-step fast, mix=0.125): route2 (n=3): hack 0.031+/-0.031, solve 0.615+/-0.010 vanilla (n=2): hack 0.305+/-0.039, solve 0.516+/-0.032 => -27pp deploy hack AND +10pp solve. Keynote fig regenerated as a real band (3 route2 + 2 vanilla seeds, per-seed thin lines). - main.tex tab:keynote + fig:keynote filled (vanilla n=2, s41=job 77 pending). - results.md Q12 (route2 deploy n=3) + Q13 (floor leak = staleness not structure: no-floor 0.000, floor+stale 0.125, floor+refresh-1 0.000, job 73). - RESEARCH_JOURNAL 2026-06-02 entry. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-02 11:08:41 +00:00
wassname	f88b8b32c1	results: add Q10 (pairset mechanism>framing>placebo) + Q11 (60-step convergence gap closes) Q10: swap only pair-set content (all bases k=12/tau=0, trained k=5, seed-41 mix=0.125 frozen). prog_wide (mechanism) -0.226; semantic framings ~0; null_city placebo +0.024. v_hack tracks the hack mechanism, not a generic honesty direction. n=1 per row, baseline noise +/-0.06. Q11: 60-step seed-42 mix=0.125, gap closes (vanilla 0.936, frozen 0.957, refresh-2 0.907) -- projection delays but does not prevent hacking at this horizon. n=1, confounded with mix/seed vs Q2. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-30 02:34:22 +00:00
wassname	5d83adbb25	fix: correct the "18 vs 21 pair" basis claim (it was never about pair count) Read the safetensors shapes/metadata: v_hack_full = 10 pairs / k=5, v_hack_21pairs = 16 pairs / k=12 (n_heldout=2; neither is 18 or 21). The two bases differ on pairs AND directions-kept AND extract-tau simultaneously, so the hack-cut gap is triple-confounded, not a clean "pair set is the lever" result. Nothing was lost: the strong basis reproduces from current pairs.py via --top-k=12 --v-hack-drop-bottom-frac=0.0, and refresh already re-extracts at k=12. Rewrites Q8 + the top confound bullet + the README findings caveat. A one-knob k-sweep is needed to attribute the gain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 10:12:12 +00:00
wassname	46f10d8150	results: absolute-rate tables + provenance, lock mix=0.125 default docs/results.md: lead with absolute last-5 rates (compare within a table by eye); restrict refresh-cadence/gate/basis comparisons to the seed they actually share (kills the fake refresh "ladder" that compared n=1 cadences to a 4-seed frozen mean); add Q6 solve columns, Q8 pair-content axis breakdown (8/18 pairs are axis-1 weak-tests; the 21-pair set is not in committed pairs.py -> FIXME), Q9 solve-orth negative result, and a dynamics note (solve never climbs; hack plateaus ~step 15). scripts/results.py: add `log` provenance column; drop the wide argv/time cols. Lock mix_ratio=0.125 as the default (FastConfig group 4->8 so the split is non-degenerate; drop --mix-ratio=0.5 from fast recipes). Q6 shows 0.125 keeps the hack cut with no solve tax. Smoke passes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 09:30:30 +00:00
wassname	e7cdcaa0ab	results: same-seed paired deltas + std, exclude incomplete runs - paired view: join projected to vanilla on (mix, seed), per-seed delta, mean +/- std over shared seeds. Comparing a 3-seed mean to a 1-seed point is meaningless; this enforces same-seed comparison (ml_debug principle). - grouped view now reports std across seeds (null at n=1). - exclude in-progress/aborted runs (must log all `steps`) so partial logs don't read as impossibly-good results. - docs/results.md rewritten around paired deltas; honest that at n=4 the last-5 Dhack std (~0.15) ~= the mean (~0.13), so the effect is consistent in sign but not cleanly separated from zero. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 08:10:42 +00:00
wassname	4464f9d312	results tooling + solve-orth knob + results-by-question doc - scripts/results.py + `just results`: aggregate logs/*.log into last-5 hack_s and gt_s (solve) tables, sorted-by-time + grouped-by-config, with full argv provenance column. Filters smoke/probe runs. - extract_vhack_grad: solve_orth_m knob — strip top-m known-solve subspace (SVD of clean-side grads) from D before SVD, so projection doesn't ablate the solve signal. No grader/oracle, off by default. - docs/results.md: every experiment grouped by the question it answers (feasibility, H1, gate_mode, basis, refresh, mix, noise-floor, pair-set) with comparison tables and answers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 07:21:05 +00:00

6 Commits