evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 18:59:35 +08:00

Files

T

wassname 2a21fbc49c spec(distill_probe): Phase 1 done (UAT 4/4), Phase 2 candidates R5-R7

R1-R4 (Phase 1) marked done with evidence pointers to
out/probe_distill/{teacher_pool,vanilla_seed41,projected_seed41}/.

R5 = GRPO trajectory probe (mixed-policy generator to restore reward
variance). R6 = LoRA-vs-SVD arm comparison. R7 = GRPO-contrastive
v_hack re-extraction (fallback only).

Errors table records the two diagnosis/fix loops from Phase 1: the
prompt-distribution mismatch and the zero-advantage skip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-25 10:22:19 +00:00

20260525_distill_cosine_probe.md

spec(distill_probe): Phase 1 done (UAT 4/4), Phase 2 candidates R5-R7

2026-05-25 10:22:19 +00:00