journal: act_vote routes late where grad-cosine cliffs (killed-run partial data)

This commit is contained in:
wassname
2026-06-08 22:50:09 +00:00
parent 1cb5900de5
commit c721c460a4
+46
View File
@@ -3884,3 +3884,49 @@ run before job 15. Table: `out/diag/pairs_compare.csv`.
No pair-set beats authored_all => no new pre-vanilla run (user's "if one beats authored, run it first"
condition not met). Queue proceeds: job 18 (act_vote) running, job 16 (vanilla) behind.
## 2026-06-08 (gm) -- act_vote routes LATE where grad-cosine cliffs (killed run, partial data)
**Context:** job 18 (act_vote, authored pairs, recency-clean) was killed at step ~29 by an operator
error (misread "vanilla" as a kill order). Partial per-step routing data survives in `pueue log 18`.
Worth recording before the log is cleaned -- the routing trace is the finding.
### Observations (rout = unit share fully routed; routE = energy share)
| step | grad-cosine (job 15) rout | act_vote (job 18) rout |
|---|---|---|
| 6 | 0.63 | (emerging) |
| 10 | 0.32 | 0.25 |
| 15 | 0.20 | 0.46 |
| 17 | nan | 0.88 |
| 19 | 0.20 | 0.50 |
| 20 | 0.09 | 0.00 |
- [obs] grad-cosine rout declines ~monotonically 0.63 -> 0.09 by step 20 (the frout cliff).
- [obs] act_vote rout is volatile but sustains high peaks late (0.88 @17, 0.50 @19); routE hit 0.93 @17.
- [obs] act_vote val: train/knob-on hack 0.000->0.312->0.625 (steps 0,10,20), deploy/knob-off 0.000
throughout the captured steps (knob held the cheat while it ran).
### Inferences
- [inf] act_vote doesn't cliff because it gates on ACTIVATIONS, which still carry the hack signal after
the gradient flattens. grad-cosine gates on the gradient, which decays as within-group GRPO advantage
-> 0 post-saturation. {reason: the two arms differ only in gate signal; the cliff tracks advantage
flattening; credence 0.65}.
- [inf] act_vote's volatility (rout swings 0<->0.88, many exact-0/1) is band saturation: the vote band
is narrow (width 0.093) so live votes fall mostly below-lower or above-upper, few in the ramp. A wider
band would smooth it. {reason: resid (0<f<1 share) ~0 every step; credence 0.6}.
### Failure modes considered
- **Most-likely:** "routes more" != "suppresses more at deploy" -- absorption may flatten the deploy
number regardless (H2). The killed run never reached final deploy. Prior 0.5. Check: rerun to finish.
- **Subtle:** sustained late routing could be routing NOISE (post-saturation grads are small/noisy), not
hack -- act_vote keeps dumping low-information grad into the knob. Prior 0.3. Check: deploy solve.
- **Null:** the volatility is the whole story and mean routed mass ~ grad-cosine; "higher" is selection
on the peaks. Prior 0.3. Check: compare mean routE over matched steps on a full rerun.
### Next action
Reran: act_vote requeued as the next arm after vanilla (the run that was killed). Band-widening for
act_vote is a candidate follow-up (smooth the 0/1 saturation).