evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 20:21:41 +08:00

Files

T

wassname e04548987f spec2 + base_pool generator + slim replay save (partial mixed-replay TODO)

spec2.md records:
 - Phase 1 result (NLL cos signal +0.747 pure-hack vs +0.398 mixed)
 - Phase 2: mixed-replay GRPO probe, partial impl
 - Phase 3: $400/65h sweep, predicated on Phase 2 cos_in signal

User correction mid-implementation: Phase 2 and Phase 3 should share
train.py code with different --steps, not build separate replay
machinery. Mixed-replay refactor in probe_distill.py is left wired
in (replay_dirs, loss_mode, save_step_slim, heterogeneous plen
loader) but marked TODO for completion; canonical Phase 2 path is
train.py at smaller scale.

probe_distill.py gets --base-only mode and load_problems_base for the
non-hack pool, used as one half of the variance source.

Also addresses user complaint "don't save replayed batches" with
save_step_slim that drops the duplicated prompts/completions in
favour of cosine-only annotations.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-25 11:48:48 +00:00

20260525_distill_cosine_probe.md

spec(distill_probe): Phase 1 done (UAT 4/4), Phase 2 candidates R5-R7

2026-05-25 10:22:19 +00:00

20260525_review_T5.md

spec: reject T5 mixed-policy design after external review

2026-05-25 10:26:33 +00:00

spec2.md

spec2 + base_pool generator + slim replay save (partial mixed-replay TODO)

2026-05-25 11:48:48 +00:00