mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:15:35 +08:00
fix: correct the "18 vs 21 pair" basis claim (it was never about pair count)
Read the safetensors shapes/metadata: v_hack_full = 10 pairs / k=5, v_hack_21pairs = 16 pairs / k=12 (n_heldout=2; neither is 18 or 21). The two bases differ on pairs AND directions-kept AND extract-tau simultaneously, so the hack-cut gap is triple-confounded, not a clean "pair set is the lever" result. Nothing was lost: the strong basis reproduces from current pairs.py via --top-k=12 --v-hack-drop-bottom-frac=0.0, and refresh already re-extracts at k=12. Rewrites Q8 + the top confound bullet + the README findings caveat. A one-knob k-sweep is needed to attribute the gain. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -95,14 +95,14 @@ measurements and the move from Qwen3.5-2B to Qwen3-4B (reference substrate).
|
||||
These are headline results from the fast preset (20 steps, mix=0.5, seed=41).
|
||||
Full provenance and per-step log audits are in `RESEARCH_JOURNAL.md`.
|
||||
|
||||
**What appears to work (entry h, n=1):** widening the hand-crafted contrastive
|
||||
pair set from 12 to 21 pairs across 6 axes (weak run_tests, hardcode,
|
||||
persona-voice, try/except swallow, type-only assert, weak inequality predicate)
|
||||
drops last-5 student hack rate from 77.5% (12-pair projected, frozen V) to
|
||||
47.5% (21-pair projected, frozen V), at matched ground-truth pass rate near
|
||||
20%. The hack-gt gap shrinks from ~50pp to ~27.5pp, so the projection looks
|
||||
selective rather than just deflating both rates. Vanilla-baseline head-to-head
|
||||
and seed=42/43 replicates are queued.
|
||||
**What appears to work (seed 41):** a stronger extracted basis drops last-5
|
||||
student hack rate from 77.5% (`v_hack_full`) to 47.5% (`v_hack_21pairs`),
|
||||
frozen V, at matched ground-truth pass rate near 20%. CAVEAT (corrected
|
||||
2026-05-29 from the safetensors shapes, see docs/results.md Q8): the two bases
|
||||
differ on three axes at once — pairs used (10 vs 16), directions kept (k=5 vs
|
||||
k=12), and extract tau (0.25 vs 0.0) — so this is NOT cleanly "more pairs".
|
||||
A one-knob k-sweep is needed to attribute the gain. Vanilla-baseline
|
||||
head-to-head and seed=42/43 replicates are queued.
|
||||
|
||||
**What turns out to matter for the design (entries f, i):** the extracted
|
||||
v_hack basis goes stale fast during training. The per-step cosine of the
|
||||
|
||||
Reference in New Issue
Block a user