mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:30:30 +08:00
journal: routeV margin band (p75/p75) verified routing 28.7% on real 4B; dir6 restarted on it
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -3166,3 +3166,39 @@ split padded+concatenated with no shape error.
|
||||
and read `slv_dep`: the direct test of whether the exploration floor lifts deploy-solve
|
||||
vs the no-floor job 60. Keep the orthogonal `hkgap`-decay question (frozen vs
|
||||
`--vhack-refresh-every=2`) as a separate run so the two levers stay attributable.
|
||||
|
||||
# 2026-06-07
|
||||
|
||||
## routeV band: widest -> p75/p75 margin (route the confident tail); dir6 sweep restarted on it
|
||||
|
||||
**Change (commit d9ea20b).** `route_band_edges` switched from the widest edge
|
||||
(`lower=min clean, upper=max hack`) to a precision margin band
|
||||
(`lower=p75 clean, upper=p75 hack`). The wide band routed even neutral rollouts
|
||||
(~0.4 of a cos=0 gradient), the over-route that costs deploy-solve. Margin routes
|
||||
only the live tail above the clean cluster and lets absorption cover the unrouted
|
||||
middle (gradient_routing.md Fig 5-right: retain cost is proportional to routed mass;
|
||||
SGTM Fig 5b: ~40% undiscovered tolerated, leak<0.02). p75 not min/max because 10
|
||||
pairs make the extremes single-sample noisy; p25(clean) rejected (would route clean,
|
||||
the expensive false-positive). It is an absolute cos threshold, so a clean batch
|
||||
routes ~nothing without the per-batch-quantile pathology.
|
||||
|
||||
**Risk checked: does the off-distribution pair band sit above live and route ~nothing?**
|
||||
No. On real Qwen3-4B the band built at `lower=+0.037, upper=+0.256` (vs live median
|
||||
cos ~-0.06 from the wide run). Job 9 (per-token) routed frac f = +0.287 at step 1
|
||||
(step 0 = 0.000, first-step gate-cache artifact). ~29% routed, comparable to wide's
|
||||
~23%, NOT collapsed. The low p75(clean) edge (+0.037, vs max(clean)~+0.3) is what
|
||||
avoids the under-route; choosing p75 over max was load-bearing.
|
||||
(evidence: logs/20260607T134234_fast_routingV_seed43_dir6_routeV_pertoken_s43.log)
|
||||
|
||||
**Caveat for reading dir6.** The whole dir6 directionality sweep (jobs 8-15) was
|
||||
restarted on this margin band, so its deploy numbers are NOT comparable to earlier
|
||||
wide-band routeV runs. Per-token jobs (9,11,15) show `nan` in the streaming
|
||||
keep/resid/rout gauge (a pre-existing `_zone_stats`-on-empty-live fragility poisoning
|
||||
the mean); per-rollout jobs read clean, frout is in the debug log. Directionality
|
||||
conclusion (real vs random v_grad) is band-robust either way.
|
||||
|
||||
**Next (for the user, not done).** The principled shift-robust gate is a live-cos
|
||||
rolling quantile (route top-q of live `cos(g,v_grad)`, threshold tracked across steps
|
||||
so batch composition varies naturally) -- decouples the threshold from how wide the
|
||||
off-distribution pairs are. Deferred: bigger change, not safe to deploy unattended
|
||||
across a running sweep.
|
||||
|
||||
Reference in New Issue
Block a user