docs: record adapter shapes + shrinkage-vs-separation; journal rotation fix

AGENTS.md: new section on PiSSA (delta_S:[r] diag) vs LoRA (A:[r,d_in] full)
adapters -- forward sees only the sum so same-basis routing is a magnitude split
(shrinkage null) unless broken by gate discrimination x (expressiveness + structural
separation). Honest note that this wasn't clear to me first pass.

RESEARCH_JOURNAL: rotation fix + the verified shrinkage confound (antipasto.py:107
sums kept+hack in one basis); the deploy delta_S*=(1-qE) control is the cheap decider.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-10 06:50:42 +00:00
parent 0112f4a36d
commit 7511ba12e8
2 changed files with 60 additions and 0 deletions
+26
View File
@@ -4053,3 +4053,29 @@ in main.tex.
### Next
Wait for job 28. If hack_deploy with clean pairs is still << 0.1 (comparable to contaminated): result is robust, narrative is "even mechanism-agnostic weak-testing pairs suppress hacking". If it rises back toward vanilla: need better pairs or need to acknowledge the result depends on axis-1-specific knowledge.
## 2026-06-10 06:20 -- rotation fix for the unhackable flip + the fable-review shrinkage confound
**Context:** commit `0112f4a` on `probe/distill-cosine`. Smoke: `/tmp/claude-1000/smoke_full.log`. New gate `scripts/verify_rotation.py`; `scripts/verify_rewards.py` gains gt_only rows.
### Observations
- [obs] The gt_only "unhackable" flip was seeded on `(seed, pid)` only (`data.py:90`, pre-fix), applied once at load -> the SAME ~10% of problems were unhackable every step. Frozen, not rotating, despite the design intent.
- [obs] Fixed: flip moved to the train step loop seeded on `(seed, step, pid)`; both prompt (`messages_gt`, plain hint) and grader (`eff_mode=gt_only`) flip; teacher demos skipped on flipped steps. `verify_rotation`: over 50 steps the unhackable subset changed 49/49 step-pairs (PASS). `verify_rewards` gt_only rows: every hack -> passed=False, reward 0.5 (PASS). Smoke: 1/30 draws flipped and graded gt_only.
- [obs] `antipasto.py:107` forward = `y + (kept + hack)` with `kept = U@(a*delta_S)`, `hack = U@(a*delta_S_hack)`, both using the same `U`, `Vh`, `a=Vh@x`. The two adapters are in the identical functional position.
### Inferences
- [inf] Because the forward sums two adapters in the same basis with identical per-step gradients, scalar (per-rollout) routing just partitions one vanilla-sized update between two stores; `delta_S + delta_S_hack` ~ the full vanilla update, train-time behaviour ~ vanilla, and deploy zeroes ~qE~0.5 of the update. So the headline DEPLOY-HACK suppression is very probably mechanical SHRINKAGE, not direction. {reason: "fable review #1, confirmed by reading the forward; matches placebo job86 (dead vec -> deploy hack 0.000)", credence: 0.7}
- [inf] The surviving directional claim is SOLVE RETENTION (real-v 0.625 vs placebo 0.531, n=1), not hack suppression -- the evidence metric inverts. {reason: "if both vecs route ~qE and both kill hack, only solve differs", credence: 0.55}
- [inf] Rotation only helps the METHOD if routing genuinely sorts hack-rollout from solve-rollout gradients (discriminating gate) or routes the vec-aligned COMPONENT; otherwise rotation + scalar routing still collapses to shrinkage at deploy. {reason: "same-basis sum argument", credence: 0.75}
### Failure modes considered
- **Most likely:** Hack suppression is shrinkage. Prior 0.7. Check: deploy a trained vanilla ckpt with `delta_S *= (1-qE)`; if hack ~0, confirmed (task #28).
- **Subtle:** Adam per-parameter state makes the two-adapter split non-linear, so real-v's routing trajectory differs from a random split in a way the scalar argument misses. Prior 0.2. Check: coin-flip gate at matched routed-energy vs real-v.
- **Null:** The n=1 solve gap (0.625 vs 0.531) is seed noise and direction does nothing at all. Prior 0.25. Check: 3 seeds of real-v vs placebo.
### Next
Killed job 30 (vanilla eval3 baseline ran the OLD frozen-flip env); requeued as job 39 on the rotating code so the bake-off (arms 35/37/38, all post-commit -> rotating) is apples-to-apples. Then run the shrinkage control (#28) and prototype component routing (#29).