mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:15:35 +08:00
route2: fail loud if real v_grad band collapses (extraction broken)
Fresh-eyes review flagged that nothing asserted upper>lower for the REAL v_grad: a broken extraction (hack pairs aligning no more than clean) would silently degenerate into the random-control sign gate via the max(.,1e-6) floor. Assert mean band width > 0 on non-Haar runs; the Haar control is still allowed to collapse. No correctness change to the gate math (review found conservation, per-rollout recovery, cosine masking, closure capture all OK). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -182,6 +182,13 @@ as real). Defence (a) is mandatory; (b) only if (a) shows a mass gap.
|
||||
|
||||
## Implementation plan (src/vgrout/train.py)
|
||||
|
||||
STATUS 2026-06-06 (commit 485839d): route rewrite DONE and smoke-verified. `route_band_edges`
|
||||
builds the band at extract + on refresh; `_route2_grad_filter` is the banded ramp gate;
|
||||
`build_route2_anchors`, the EMA `tau` state, `--gate-anchor-teacher-only`, and
|
||||
`scripts/verify_gate_anchor.py` are gone. Smoke: band width +0.289 real vs -0.014 Haar-random;
|
||||
`||delta_S_hack||>0`, R3 span assert green, resid~0. DEFERRED: the held-out-pair separation
|
||||
gauge (needs a second forward over the `n_val` pairs; diagnostic only, not load-bearing).
|
||||
|
||||
Rollback tag `pre-routing-refactor`. erase already works; the code below is the route rewrite.
|
||||
|
||||
1. **DELETE `build_route2_anchors`** (~line 337) and its call site. No anchors from teacher
|
||||
|
||||
+9
-2
@@ -494,10 +494,17 @@ def main(cfg: Config) -> int:
|
||||
# Routing band from the pairs (against the FINAL v_grad, so a Haar override
|
||||
# collapses the band -- the real-vs-random discriminator).
|
||||
route_band = route_band_edges(raw_grads, v_grad, device)
|
||||
_bw = [hi - lo for lo, hi in route_band.values()]
|
||||
_mean_bw = sum(hi - lo for lo, hi in route_band.values()) / len(route_band)
|
||||
logger.info(f"route2 band: edges from {len(route_band)} modules, "
|
||||
f"mean width(upper-lower)={sum(_bw)/len(_bw):+.3f} "
|
||||
f"mean width(upper-lower)={_mean_bw:+.3f} "
|
||||
f"(>0 = pairs separate; ~0 = random/degenerate)")
|
||||
# On a REAL v_grad the band must open (hack pairs align more than clean).
|
||||
# A collapsed/inverted real band = broken extraction silently mimicking the
|
||||
# random control -> fail loud. The Haar control is allowed to collapse.
|
||||
if cfg.route2_random_v_seed is None:
|
||||
assert _mean_bw > 0, (
|
||||
f"real v_grad gave non-positive mean band width {_mean_bw:+.3f}: "
|
||||
"hack pairs do not separate from clean -> extraction broken")
|
||||
model.train()
|
||||
else:
|
||||
# v_hack path resolution, most-specific first. The pairset (personas) is
|
||||
|
||||
Reference in New Issue
Block a user