evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 16:30:30 +08:00

Files

T

wassname 97aede8d9c fix(routeV): keep=bulk gate + deterministic teachers + deploy-mode generation

Three fixes after the deploy-solve=0 run (all user-confirmed):

1. Gate co-adaptation: t_lo sat at quantile(route_quantile), so keep was only the
   bottom ~5% and ~95% of rollouts landed in mid (both blocks train, qmass~0.5).
   Move the keep/mid boundary up: t_lo=quantile(1-2q), t_hi=quantile(1-q), so keep
   is the BULK, mid+rout are the top 2q. Three-zone absorption preserved (not deleted).

2. Deterministic teachers: every teacher-phase prompt is drawn from the both-pool-
   covered set and gets EXACTLY teacher_n_per_prompt hack + N solve (constant count,
   no flip/coverage drops). Replaces mix_ratio*_even_split (count varied per step).
   No flip in the teacher phase (solve teacher carries solve pressure). mix_ratio>0
   stays the on/off switch. Removed dead _even_split.

3. Deploy-mode generation: student rollouts generate under ablate_quarantine, so the
   behavior policy = the shipped deployed-only model -- the quarantine's learned hack
   can't saturate the rollout distribution and starve honest solve advantage. For
   clean-gated rollouts gen and train forward now match.

Also: FastConfig lr 1e-4->5e-4 (random-init lora2r needs more lr in the short budget).
AGENTS.md: don't bake unconfirmed theories into comments; don't inflate diagnosis
confidence across turns. Smoke + smoke-solvemix green; all verify gates pass.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-11 00:29:12 +00:00

__init__.py

rename python package projected_grpo -> vgrout

2026-06-05 14:51:48 +08:00

data.py

fix: rotate the unhackable (gt_only) subset per step, not frozen per pid

2026-06-10 06:14:08 +00:00

derisk_loopholes.py

cleanup: consolidate stale loaders and pair scripts

2026-06-09 12:47:32 +00:00

eval.py

refactor: collapse to lora2r-only (none/routeV/absorb); delete erase/antipasto/lora_frozen_b paths