docs/results.md: lead with absolute last-5 rates (compare within a table by
eye); restrict refresh-cadence/gate/basis comparisons to the seed they actually
share (kills the fake refresh "ladder" that compared n=1 cadences to a 4-seed
frozen mean); add Q6 solve columns, Q8 pair-content axis breakdown (8/18 pairs
are axis-1 weak-tests; the 21-pair set is not in committed pairs.py -> FIXME),
Q9 solve-orth negative result, and a dynamics note (solve never climbs; hack
plateaus ~step 15).
scripts/results.py: add `log` provenance column; drop the wide argv/time cols.
Lock mix_ratio=0.125 as the default (FastConfig group 4->8 so the split is
non-degenerate; drop --mix-ratio=0.5 from fast recipes). Q6 shows 0.125 keeps
the hack cut with no solve tax. Smoke passes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- paired view: join projected to vanilla on (mix, seed), per-seed delta, mean
+/- std over shared seeds. Comparing a 3-seed mean to a 1-seed point is
meaningless; this enforces same-seed comparison (ml_debug principle).
- grouped view now reports std across seeds (null at n=1).
- exclude in-progress/aborted runs (must log all `steps`) so partial logs
don't read as impossibly-good results.
- docs/results.md rewritten around paired deltas; honest that at n=4 the
last-5 Dhack std (~0.15) ~= the mean (~0.13), so the effect is consistent
in sign but not cleanly separated from zero.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- scripts/results.py + `just results`: aggregate logs/*.log into last-5
hack_s and gt_s (solve) tables, sorted-by-time + grouped-by-config, with
full argv provenance column. Filters smoke/probe runs.
- extract_vhack_grad: solve_orth_m knob — strip top-m known-solve subspace
(SVD of clean-side grads) from D before SVD, so projection doesn't ablate
the solve signal. No grader/oracle, off by default.
- docs/results.md: every experiment grouped by the question it answers
(feasibility, H1, gate_mode, basis, refresh, mix, noise-floor, pair-set)
with comparison tables and answers.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>