mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:45:42 +08:00
docs: record adapter shapes + shrinkage-vs-separation; journal rotation fix
AGENTS.md: new section on PiSSA (delta_S:[r] diag) vs LoRA (A:[r,d_in] full) adapters -- forward sees only the sum so same-basis routing is a magnitude split (shrinkage null) unless broken by gate discrimination x (expressiveness + structural separation). Honest note that this wasn't clear to me first pass. RESEARCH_JOURNAL: rotation fix + the verified shrinkage confound (antipasto.py:107 sums kept+hack in one basis); the deploy delta_S*=(1-qE) control is the cheap decider. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -4053,3 +4053,29 @@ in main.tex.
|
||||
### Next
|
||||
|
||||
Wait for job 28. If hack_deploy with clean pairs is still << 0.1 (comparable to contaminated): result is robust, narrative is "even mechanism-agnostic weak-testing pairs suppress hacking". If it rises back toward vanilla: need better pairs or need to acknowledge the result depends on axis-1-specific knowledge.
|
||||
|
||||
## 2026-06-10 06:20 -- rotation fix for the unhackable flip + the fable-review shrinkage confound
|
||||
|
||||
**Context:** commit `0112f4a` on `probe/distill-cosine`. Smoke: `/tmp/claude-1000/smoke_full.log`. New gate `scripts/verify_rotation.py`; `scripts/verify_rewards.py` gains gt_only rows.
|
||||
|
||||
### Observations
|
||||
|
||||
- [obs] The gt_only "unhackable" flip was seeded on `(seed, pid)` only (`data.py:90`, pre-fix), applied once at load -> the SAME ~10% of problems were unhackable every step. Frozen, not rotating, despite the design intent.
|
||||
- [obs] Fixed: flip moved to the train step loop seeded on `(seed, step, pid)`; both prompt (`messages_gt`, plain hint) and grader (`eff_mode=gt_only`) flip; teacher demos skipped on flipped steps. `verify_rotation`: over 50 steps the unhackable subset changed 49/49 step-pairs (PASS). `verify_rewards` gt_only rows: every hack -> passed=False, reward 0.5 (PASS). Smoke: 1/30 draws flipped and graded gt_only.
|
||||
- [obs] `antipasto.py:107` forward = `y + (kept + hack)` with `kept = U@(a*delta_S)`, `hack = U@(a*delta_S_hack)`, both using the same `U`, `Vh`, `a=Vh@x`. The two adapters are in the identical functional position.
|
||||
|
||||
### Inferences
|
||||
|
||||
- [inf] Because the forward sums two adapters in the same basis with identical per-step gradients, scalar (per-rollout) routing just partitions one vanilla-sized update between two stores; `delta_S + delta_S_hack` ~ the full vanilla update, train-time behaviour ~ vanilla, and deploy zeroes ~qE~0.5 of the update. So the headline DEPLOY-HACK suppression is very probably mechanical SHRINKAGE, not direction. {reason: "fable review #1, confirmed by reading the forward; matches placebo job86 (dead vec -> deploy hack 0.000)", credence: 0.7}
|
||||
- [inf] The surviving directional claim is SOLVE RETENTION (real-v 0.625 vs placebo 0.531, n=1), not hack suppression -- the evidence metric inverts. {reason: "if both vecs route ~qE and both kill hack, only solve differs", credence: 0.55}
|
||||
- [inf] Rotation only helps the METHOD if routing genuinely sorts hack-rollout from solve-rollout gradients (discriminating gate) or routes the vec-aligned COMPONENT; otherwise rotation + scalar routing still collapses to shrinkage at deploy. {reason: "same-basis sum argument", credence: 0.75}
|
||||
|
||||
### Failure modes considered
|
||||
|
||||
- **Most likely:** Hack suppression is shrinkage. Prior 0.7. Check: deploy a trained vanilla ckpt with `delta_S *= (1-qE)`; if hack ~0, confirmed (task #28).
|
||||
- **Subtle:** Adam per-parameter state makes the two-adapter split non-linear, so real-v's routing trajectory differs from a random split in a way the scalar argument misses. Prior 0.2. Check: coin-flip gate at matched routed-energy vs real-v.
|
||||
- **Null:** The n=1 solve gap (0.625 vs 0.531) is seed noise and direction does nothing at all. Prior 0.25. Check: 3 seeds of real-v vs placebo.
|
||||
|
||||
### Next
|
||||
|
||||
Killed job 30 (vanilla eval3 baseline ran the OLD frozen-flip env); requeued as job 39 on the rotating code so the bake-off (arms 35/37/38, all post-commit -> rotating) is apples-to-apples. Then run the shrinkage control (#28) and prototype component routing (#29).
|
||||
|
||||
Reference in New Issue
Block a user