mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 18:59:35 +08:00
feat: online_stats gate for routeV -- live q5/q95 band calibration
New routeV_gate="online_stats" mode: use the empirical per-rollout cosine distribution (q5/q95 pooled across all modules each step) as the routing band thresholds, instead of the pair-derived route_band. Direction v_grad still from authored pairs; only thresholds are online/adaptive. Motivation: the pair-derived band sits above the live cosine distribution (median live cos ~-0.06), causing frout to cliff as GRPO advantage flattens. Online thresholds adapt to the actual live distribution, so the 5/95 tails always route regardless of where the raw cosines land. Config: routeV_gate="online_stats", online_stats_lo=0.05, online_stats_hi=0.95. Step-0 prior: (-0.5, 0.5) neutral band (pairs not used for calibration). Band update: post-opt.step(), torch.quantile over that step's module*rollout cosines. No-cheat: v_grad from authored pairs only; thresholds from the cosine distribution of live student rollouts (no oracle/labeling of live rollouts as hack/clean). Also: add online_stats to results_deploy._arm(); justfile queue-online-stats recipe. Queued as job 22 (s43, authored pairs, priority 12, after 19/20/21). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -64,6 +64,7 @@ def _arm(argv: str) -> str:
|
||||
if "--intervention=none" in argv:
|
||||
return "vanilla"
|
||||
gate = ("act_vote" if "--routeV-gate=act_vote" in argv else
|
||||
"online_stats" if "--routeV-gate=online_stats" in argv else
|
||||
"lora" if "lora_frozen_b" in argv else
|
||||
"per-token" if "--routeV-per-token" in argv else "grad-cos")
|
||||
return f"routeV/{gate}" + ("·randV" if "--routeV-random-v-seed" in argv else "")
|
||||
|
||||
Reference in New Issue
Block a user