writeup

2026-06-27 17:30:41 +08:00 · 2026-06-12 04:46:01 +00:00
parent af420ec855
commit 41d225a5ec
8 changed files with 357 additions and 188 deletions
@@ -47,8 +47,8 @@ no oracle or ground-truth label from a training rollout is used during training.
 At training time routeA scores each rollout on the no-grad `logp_old` forward it
 already needs: an activation-capture hook pools the same bottleneck activations
 over completion tokens, and the score is the pooled dot product with `v_act`.
-Thresholds come from a rolling buffer of recent scores, z-normalized and split by
-two-threshold Otsu into `{keep, absorb, rout}`; until the buffer reaches
+Thresholds are the symmetric `route_tail_q` quantiles of a run-spanning score
+buffer, splitting rollouts into `{keep, absorb, rout}`; until the buffer reaches
 `route_warmup` scores the gate pins absorb. The block masks are set from those labels *before* the single
 masked forward+backward, so there is no second gradient pass. A rollout scoring
 at or above the upper threshold updates the quarantine block while its deployed