mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 17:30:41 +08:00
refactor: route2 quarantine = scale-matched delta_S_hack, rip out 33M LoRA
The distinct-basis A_q/B_q LoRA (~33M params at rank-16) gave the quarantine a
~100x capacity edge over delta_S, so routing-everything-there was the low-
resistance path: qE pinned ~0.97 (energy into the thrown-away knob) while the
deployed delta_S learned nothing (job 54). The cause was capacity imbalance, not
the routing gate (calibrated-tau already separated hack/clean, hkgap>0).
Consolidate to one adapter type: the quarantine is now delta_S_hack, the second
diagonal in the same frozen SVD basis, shape [r], capacity-matched to delta_S,
zeroed at deploy. route2's calibrated-tau gate parks the flagged rollouts' grad
into delta_S_hack.grad (like proj.py's route parks its subspace projection);
delta_S keeps the unflagged. Both diagonals train at one shared lr.
Removed: A_q/B_q params, v_act + extract_v_act, the act-mask arm (a shared
diagonal can't be per-token gated), route2_mask / route2_quarantine_rank /
route2_quar_lr_scale knobs, the separate quar optimizer group. Arm name
routing2_{act,grad} -> routing2. v_grad refresh extracts from delta_S (main)
with the quarantine ablated.
SGTM check: their gradient routing uses a hard detach on capacity-matched
reserved dims, no soft/tanh/sigmoid gate -- balance is the fix, not gating.
Smoked clean: tau/hkgap/qE render, ||delta_S_hack||>0 assert passes, exit 0.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -84,9 +84,8 @@ over the first few steps. Sanity: by a few steps μ_clean<~0.1, μ_hack>~0.2-ish
|
||||
|
||||
## Verify / queue / follow
|
||||
|
||||
- `just smoke-route2 --route2-mask=grad` (or the smoke recipe that hits grad
|
||||
path): confirm tau/hkgap columns render, routing fires (flagged frac < ~0.5,
|
||||
not pinned at 0.5), exit 0.
|
||||
- `just smoke-route2`: confirm tau/hkgap/qE columns render, routing fires
|
||||
(flagged frac < ~0.5, not pinned at 0.5), exit 0.
|
||||
- Queue (substrate, 60 steps, seed 41), label:
|
||||
why: does per-step calibrated-τ vector routing (route2-grad) stop over-routing
|
||||
(flagged<<0.5) and suppress held-out deploy-hack vs vanilla at matched solve;
|
||||
@@ -98,3 +97,56 @@ over the first few steps. Sanity: by a few steps μ_clean<~0.1, μ_hack>~0.2-ish
|
||||
Detector flag (hack_E) only CALIBRATES μ_hack (known A) + anchors teacher hacks.
|
||||
The routing GATE is cos>τ applied to ALL rollouts incl. unknown B. gt_pass never
|
||||
touches routing. Compliant.
|
||||
|
||||
## Status: IMPLEMENTED + RUNNING (job 54)
|
||||
|
||||
Implemented in `6eb894f`, smoked (tau/hkgap render, exit 0). Running as job 54
|
||||
(calibrated-τ + grad_clip 0.5, route2-grad, 60 steps, seed 41). job 52
|
||||
(caltau-alone) killed and folded in; clip 0.5 is a no-op while gn<0.5 so 54 is
|
||||
the strict superset.
|
||||
|
||||
### Live observation (steps 0-2, 2026-06-01)
|
||||
|
||||
The calibration works as a discriminator but does NOT fix over-routing-by-energy:
|
||||
- `hkgap` positive and rising (0.00 -> 0.03 -> 0.08): the v_grad direction DOES
|
||||
separate the hack cloud from the clean cloud. The vector is alive.
|
||||
- `tau` tracking up with it (0.00 -> 0.02 -> 0.04): the threshold rides the drift
|
||||
as designed.
|
||||
- BUT `qE` 0.73 -> 0.97 -> 0.97: ~97% of gradient ENERGY lands in the deleted
|
||||
quarantine by step 1, and `gt_s` 3 -> 7 -> 0 (solving collapsed).
|
||||
|
||||
Diagnosis: the failure was never the routing FRACTION (which τ fixes); it is the
|
||||
always-summed 33M A_q/B_q quarantine MAGNITUDE. Even a correctly-thresholded route
|
||||
sends the routed gradient into a knob whose per-param grads dwarf delta_S's, so
|
||||
the energy ratio pins near 1 and the deployed adapter learns nothing. This is the
|
||||
SYNTHESIS "next lever" prediction: if qE stays high while hkgap>0, the culprit is
|
||||
quarantine magnitude, not the gate.
|
||||
|
||||
## DESIGN CHANGE (2026-06-01): one adapter, scale-matched quarantine
|
||||
|
||||
Acted on the magnitude diagnosis by removing the distinct-basis LoRA entirely.
|
||||
The quarantine is now delta_S_hack -- the SECOND diagonal in the same frozen SVD
|
||||
basis, shape [r] per module, identical capacity to delta_S. route2's calibrated-τ
|
||||
gate parks the flagged rollouts' delta_S-grad contribution into delta_S_hack.grad
|
||||
(via step_grad_hack in _route2_grad_filter), exactly as proj.py's `route` parks
|
||||
its subspace-projected component; delta_S keeps the unflagged. Both diagonals
|
||||
train at one shared lr; delta_S_hack is zeroed at deploy.
|
||||
|
||||
Rationale (user): a 33M LoRA vs a ~2k-param delta_S per module means "dump
|
||||
everything in the quarantine" is the low-resistance path -- a capacity edge, not
|
||||
honest absorption. Capacity-balanced diagonals remove that bias. SGTM's own
|
||||
quarantine is capacity-matched (a split of the same layer, equal dims), and uses
|
||||
a hard detach -- no soft/tanh/sigmoid gate -- confirming the fix is balance, not
|
||||
gating.
|
||||
|
||||
Removed: A_q/B_q params, v_act buffer + extract_v_act, the act-mask arm (a shared
|
||||
diagonal can't be per-token gated), route2_mask / route2_quarantine_rank /
|
||||
route2_quar_lr_scale knobs, the separate quar optimizer group. arm name
|
||||
"routing2_grad"/"routing2_act" -> "routing2".
|
||||
|
||||
v_grad refresh extracts from the MAIN knob (delta_S.grad) with the quarantine
|
||||
ablated -- the deployed-model gradient is what we route, and both diagonals share
|
||||
the basis so the direction is directly usable on delta_S's live gradient.
|
||||
|
||||
Smoked clean (tiny-random): tau/hkgap/qE render, ||delta_S_hack||=0.0074>0 assert
|
||||
passes, deploy-ablation fires, exit 0. Queued on the substrate (seed 41, 60 steps).
|
||||
|
||||
Reference in New Issue
Block a user