journal: route2 capacity-imbalance realization + scale-matched delta_S fix

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 19:31:11 +08:00 · 2026-06-01 02:58:35 +00:00
parent dc5d4516c2
commit 59f8b6efdc
1 changed files with 50 additions and 0 deletions
@@ -2,6 +2,56 @@

 Append-only. New entries at the top, date-stamped. Never edit old entries.

+## 2026-06-01 — route2 quarantine was capacity-imbalanced: rip out the 33M LoRA, use two scale-matched delta_S
+
+**Context:** commits `8158adb` (refactor) + `dc5d451` (GPU smoke), `probe/distill-cosine`.
+route2-grad with calibrated-tau on the seed-41 substrate (job 54 on the old LoRA code,
+job 57 on the fixed code).
+
+**Observation (job 54, distinct-basis A_q/B_q LoRA quarantine):** calibrated-tau works as a
+DISCRIMINATOR -- hkgap (ema_hack_cos - ema_clean_cos) rises 0.00->0.08 over steps 0-2, tau
+tracks it up. But qE (grad energy into the quarantine) jumps 0.73->0.97 and gt_s collapses
+3->7->0, so the deployed delta_S learns ~nothing. The LoRA is ~33M params at rank-16 vs
+delta_S's ~0.5M diagonal -- a ~60-100x capacity gap. act-mask (job 46) saladed the same way:
+cos>0 routed ~half of everything into the same oversized knob.
+
+**Interpretation:** the failure was capacity imbalance, NOT the routing gate. A quarantine
+with ~100x the params is the lower-resistance sink -- per-param grads dwarf delta_S's, so the
+energy ratio pins near 1 no matter how little is actually routed. calibrated-tau was the
+discriminating experiment that proved this: it fixed the routing FRACTION (flagged<<0.5) and
+hkgap>0 shows the direction separates, yet qE stayed ~0.97 -> magnitude, not gate.
+
+**What else this exposed (the "anything else"):**
+- The #167 "LR-too-high fix" (`quar_lr_scale=0.1`) was a band-aid on this same root cause --
+  the oversized fresh-kaiming LoRA diverged at shared lr (run 43 salad). One knob (lr) hid the
+  divergence symptom; qE exposed the absorption symptom. Same cause. Both gone now.
+- SGTM cross-check: their gradient routing uses a hard `.detach()` on a CAPACITY-MATCHED
+  reserved split of the same layer -- no soft/tanh/sigmoid gate. Confirms balance is the lever.
+- Conceptual un-nulling: two-delta_S shared-basis *grad* routing is valid despite the earlier
+  "gauge freedom" worry. We IMPOSE the split via the cos gate, so we don't rely on emergent
+  self-reinforcement to decide what lives where; the gauge worry only bites methods that need
+  specialization to emerge, not imposed routing.
+- Meta: smoke ran fp32+CPU, so it never walked the bf16+flash_attn2 path the real run uses --
+  the dtype/magnitude bug class was invisible to the correctness gate. Fixed: smoke now runs
+  on GPU (peak ~1.4GB on the tiny-random model).
+
+**Fix:** two delta_S diagonals -- `delta_S` (kept) + `delta_S_hack` (quarantine), same frozen
+SVD basis, same shape r, same lr, `delta_S_hack` zeroed at deploy. route2's calibrated-tau
+parks flagged rollouts' grad into `delta_S_hack.grad` (exactly as proj.py's `route` parks its
+subspace projection). No capacity edge -> honest absorption. Removed: A_q/B_q LoRA, v_act +
+extract_v_act, the act-mask arm (a diagonal can't be per-token gated), the route2_mask /
+quarantine_rank / quar_lr_scale knobs, the separate optimizer group. Smoked clean.
+
+**Failure modes to watch on job 57:** (1) most-likely -- balanced delta_S_hack still
+over-absorbs because cos-routing routes too much regardless of capacity; check qE drops off
+~0.97 toward ~0.5. (2) subtle -- matched capacity is too weak to hold the hack, leaks back,
+deploy-hack ~ vanilla; check deploy file_marker hack. (3) null -- route2 adds nothing over
+erase once balanced; check route2 vs erase deploy numbers (only legitimate difference is
+on-policy generation under an active quarantine).
+
+**Next:** read job 57 (route2, two scale-matched delta_S, seed 41, 60 steps) on the four
+watch-items above.
+
 ## 2026-05-31 (l) — erase (one-sided projection) vs vanilla: -7.8pp hack / +4.7pp solve, but the win is on held-out file_marker not in-dist run_tests

 **Context:** commit `b0432af` on `probe/distill-cosine`; pueue id 41; projected/erase