evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 19:47:33 +08:00

Files

T

wassname 89eaa0866b paper: record in-sample teacher-seeding method in setup section

The first 30 GRPO steps mix in cached hack demos (mix_ratio=0.125, 1 of 8
rollouts). Demos are generated in-sample by the hint-equipped hack teacher
(rl-rewardhacking-leetcode-rh-s65) in its own tokens, so the seeded gradient is
on-distribution. Teacher covers only 6 run_tests prompts; student trains on 200
(seeded-shuffle) -> the hack must generalise off the seeds (the C2 held-out
test). Adds \label{ssec:c2} for the cross-ref.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-07 11:01:31 +00:00

data

paper: interim directionality fig (app:directionality) + confound TODO

2026-06-05 23:40:02 +00:00

figs

paper: interim directionality fig (app:directionality) + confound TODO

2026-06-05 23:40:02 +00:00

.gitignore

Merge branch 'probe/distill-cosine' of https://github.com/wassname/projected_grpo into probe/distill-cosine

2026-06-02 07:21:49 +00:00

main.tex

paper: record in-sample teacher-seeding method in setup section