From c721c460a4057a8aaf7a59c0a875cc4a18879dfb Mon Sep 17 00:00:00 2001 From: wassname <1103714+wassname@users.noreply.github.com> Date: Mon, 8 Jun 2026 22:50:09 +0000 Subject: [PATCH] journal: act_vote routes late where grad-cosine cliffs (killed-run partial data) --- RESEARCH_JOURNAL.md | 46 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/RESEARCH_JOURNAL.md b/RESEARCH_JOURNAL.md index 57dedf6..5ec878b 100644 --- a/RESEARCH_JOURNAL.md +++ b/RESEARCH_JOURNAL.md @@ -3884,3 +3884,49 @@ run before job 15. Table: `out/diag/pairs_compare.csv`. No pair-set beats authored_all => no new pre-vanilla run (user's "if one beats authored, run it first" condition not met). Queue proceeds: job 18 (act_vote) running, job 16 (vanilla) behind. + +## 2026-06-08 (gm) -- act_vote routes LATE where grad-cosine cliffs (killed run, partial data) + +**Context:** job 18 (act_vote, authored pairs, recency-clean) was killed at step ~29 by an operator +error (misread "vanilla" as a kill order). Partial per-step routing data survives in `pueue log 18`. +Worth recording before the log is cleaned -- the routing trace is the finding. + +### Observations (rout = unit share fully routed; routE = energy share) + +| step | grad-cosine (job 15) rout | act_vote (job 18) rout | +|---|---|---| +| 6 | 0.63 | (emerging) | +| 10 | 0.32 | 0.25 | +| 15 | 0.20 | 0.46 | +| 17 | nan | 0.88 | +| 19 | 0.20 | 0.50 | +| 20 | 0.09 | 0.00 | + +- [obs] grad-cosine rout declines ~monotonically 0.63 -> 0.09 by step 20 (the frout cliff). +- [obs] act_vote rout is volatile but sustains high peaks late (0.88 @17, 0.50 @19); routE hit 0.93 @17. +- [obs] act_vote val: train/knob-on hack 0.000->0.312->0.625 (steps 0,10,20), deploy/knob-off 0.000 + throughout the captured steps (knob held the cheat while it ran). + +### Inferences + +- [inf] act_vote doesn't cliff because it gates on ACTIVATIONS, which still carry the hack signal after + the gradient flattens. grad-cosine gates on the gradient, which decays as within-group GRPO advantage + -> 0 post-saturation. {reason: the two arms differ only in gate signal; the cliff tracks advantage + flattening; credence 0.65}. +- [inf] act_vote's volatility (rout swings 0<->0.88, many exact-0/1) is band saturation: the vote band + is narrow (width 0.093) so live votes fall mostly below-lower or above-upper, few in the ramp. A wider + band would smooth it. {reason: resid (0