evil_MoE

wassname/evil_MoE

Fork 0

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 19:31:11 +08:00

Commit Graph

Author	SHA1	Message	Date
wassname	e7cdcaa0ab	results: same-seed paired deltas + std, exclude incomplete runs - paired view: join projected to vanilla on (mix, seed), per-seed delta, mean +/- std over shared seeds. Comparing a 3-seed mean to a 1-seed point is meaningless; this enforces same-seed comparison (ml_debug principle). - grouped view now reports std across seeds (null at n=1). - exclude in-progress/aborted runs (must log all `steps`) so partial logs don't read as impossibly-good results. - docs/results.md rewritten around paired deltas; honest that at n=4 the last-5 Dhack std (~0.15) ~= the mean (~0.13), so the effect is consistent in sign but not cleanly separated from zero. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 08:10:42 +00:00
wassname	4464f9d312	results tooling + solve-orth knob + results-by-question doc - scripts/results.py + `just results`: aggregate logs/*.log into last-5 hack_s and gt_s (solve) tables, sorted-by-time + grouped-by-config, with full argv provenance column. Filters smoke/probe runs. - extract_vhack_grad: solve_orth_m knob — strip top-m known-solve subspace (SVD of clean-side grads) from D before SVD, so projection doesn't ablate the solve signal. No grader/oracle, off by default. - docs/results.md: every experiment grouped by the question it answers (feasibility, H1, gate_mode, basis, refresh, mix, noise-floor, pair-set) with comparison tables and answers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-05-29 07:21:05 +00:00

Author

SHA1

Message

Date

wassname

e7cdcaa0ab

results: same-seed paired deltas + std, exclude incomplete runs

- paired view: join projected to vanilla on (mix, seed), per-seed delta, mean
  +/- std over shared seeds. Comparing a 3-seed mean to a 1-seed point is
  meaningless; this enforces same-seed comparison (ml_debug principle).
- grouped view now reports std across seeds (null at n=1).
- exclude in-progress/aborted runs (must log all `steps`) so partial logs
  don't read as impossibly-good results.
- docs/results.md rewritten around paired deltas; honest that at n=4 the
  last-5 Dhack std (~0.15) ~= the mean (~0.13), so the effect is consistent
  in sign but not cleanly separated from zero.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-29 08:10:42 +00:00

wassname

4464f9d312

results tooling + solve-orth knob + results-by-question doc

- scripts/results.py + `just results`: aggregate logs/*.log into last-5
  hack_s and gt_s (solve) tables, sorted-by-time + grouped-by-config, with
  full argv provenance column. Filters smoke/probe runs.
- extract_vhack_grad: solve_orth_m knob — strip top-m known-solve subspace
  (SVD of clean-side grads) from D before SVD, so projection doesn't ablate
  the solve signal. No grader/oracle, off by default.
- docs/results.md: every experiment grouped by the question it answers
  (feasibility, H1, gate_mode, basis, refresh, mix, noise-floor, pair-set)
  with comparison tables and answers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-05-29 07:21:05 +00:00

2 Commits