evil_MoE

wassname/evil_MoE

Fork 0

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 17:00:59 +08:00

Commit Graph

Author	SHA1	Message	Date
wassname	2a21fbc49c	spec(distill_probe): Phase 1 done (UAT 4/4), Phase 2 candidates R5-R7 R1-R4 (Phase 1) marked done with evidence pointers to out/probe_distill/{teacher_pool,vanilla_seed41,projected_seed41}/. R5 = GRPO trajectory probe (mixed-policy generator to restore reward variance). R6 = LoRA-vs-SVD arm comparison. R7 = GRPO-contrastive v_hack re-extraction (fallback only). Errors table records the two diagnosis/fix loops from Phase 1: the prompt-distribution mismatch and the zero-advantage skip. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-25 10:22:19 +00:00

Author

SHA1

Message

Date

wassname

2a21fbc49c

spec(distill_probe): Phase 1 done (UAT 4/4), Phase 2 candidates R5-R7

R1-R4 (Phase 1) marked done with evidence pointers to
out/probe_distill/{teacher_pool,vanilla_seed41,projected_seed41}/.

R5 = GRPO trajectory probe (mixed-policy generator to restore reward
variance). R6 = LoRA-vs-SVD arm comparison. R7 = GRPO-contrastive
v_hack re-extraction (fallback only).

Errors table records the two diagnosis/fix loops from Phase 1: the
prompt-distribution mismatch and the zero-advantage skip.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-25 10:22:19 +00:00

1 Commits