journal: 3 external reviews converge on v1-route artifact hypothesis; route2 inherits the Adam-parasite risk, prioritize leakage metric

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 18:59:35 +08:00 · 2026-05-31 10:51:11 +00:00
parent 670fcb3c64
commit ffeb632652
1 changed files with 68 additions and 0 deletions
@@ -2,6 +2,74 @@

 Append-only. New entries at the top, date-stamped. Never edit old entries.

+## 2026-05-31 (h) — three external reviews converge: the v1 additive-route "deploy=0" is most likely an AdamW-parasite + base-recovery artifact, not localization
+
+**Context:** commit `670fcb3` on `probe/distill-cosine`. Reviews:
+`docs/spec/20260531_review_gradroute_gpt55.md` (gpt-5.5),
+`docs/spec/20260531_review_gradroute_gemini35flash.md` (gemini-3.5-flash),
+`docs/spec/20260531_route2_code_review.md` (deepseek, act-mask code). grok-4.3
+review aborted (2 lines, no content). All target the OLD additive route (v1,
+shared basis `delta_S + delta_S_hack`), not route2.
+
+### Observations
+
+- [obs] gpt-5.5 and gemini INDEPENDENTLY reach the same 5 verdicts on v1 route
+  (gpt55.md:1-13, gemini.md final list): (1) mechanism mismatch — direction-based
+  gradient surgery on a SHARED additive basis is not Cloud-style data→region
+  routing; (2) shared forward means `dL/d delta_S == dL/d delta_S_hack`, so the
+  loss cannot distinguish the knobs; (3) AdamW divides the tiny routed
+  `delta_S_hack` gradient by sqrt(v), amplifying noise into parasitic drift;
+  (4) zeroing `delta_S_hack` at deploy is an off-distribution step-change, not a
+  modular ablation; (5) the solve-jump 0.20->0.64 on ablation is the red flag —
+  deleting a corrupted parasite recovers base capability rather than removing a
+  hack module.
+- [obs] Both predict: v_hack is ~28/30 run_tests, so held-out modes
+  (file_marker/sentinel/stdout) should stay hacky at deploy UNLESS they share a
+  reward/solve-avoidance direction (gpt55.md:11, gemini held-out prediction).
+- [obs] deepseek route2-act review confirmed Q1 (detach leaves the forward value
+  unchanged, isolates flagged-sample grad to A_q/B_q); stream died mid-Q2 (it was
+  re-deriving the stash-before-zero_grad order, which is already correct in code).
+  No new defect.
+
+### Inferences
+
+- [inf] The reviews VALIDATE the route2 pivot, not refute it: distinct basis
+  (A_q,B_q) is the direct fix for verdicts (1)+(2), and the random-V control
+  (#157, jobs 32/33) is exactly gpt55's proposed discriminator for verdict (5).
+  {reason: "gpt55's 'concrete experiment' is the #157 design verbatim; route2 was
+  designed against these critiques", credence: 0.85}
+- [inf] The Adam-parasite critique TRANSFERS to route2: B_q starts at 0, and
+  AdamW can drift it into a corrupted-but-deletable state, so route2 could show
+  the SAME spurious deploy solve-jump. The solve-jump is therefore a diagnostic to
+  watch, not a win to celebrate. {reason: "route2 quarantine is still AdamW-trained
+  and deleted at deploy; same mechanism as v1", credence: 0.7}
+- [inf] The leakage metric (T4) is the load-bearing proof the reviews actually
+  demand — "did hack capability LEAVE the kept adapter" — more than the deploy-hack
+  number, which the artifact can fake. Prioritize T4 before claiming route2 works.
+  {reason: "all three verdicts attack deploy-hack as confoundable; leakage measures
+  the thing directly", credence: 0.75}
+
+### Failure modes considered
+
+- **Likely:** route2's distinct basis fixes mechanism mismatch but the deploy
+  solve-jump persists -> still an Adam-parasite artifact, just in a separate basis.
+  Prior 0.4. Check: compare route2 deploy-solve delta vs vanilla; if it jumps like
+  v1's 0.20->0.64, suspect parasite.
+- **Subtle:** random-V (#157 job 33) ALSO reproduces deploy~0+solve-jump, proving
+  v1 was regularization not direction — but route2-act/grad genuinely differ
+  because the mask (not a fixed projection) gates per-sample. Prior 0.3. Check:
+  route2 vs random-V route per-mode.
+- **Null:** the whole route family is base-capability recovery; erase (no
+  quarantine) already ~vanilla (entry f), so only the quarantine arms show the
+  jump and it's all parasite. Prior 0.3. Check: leakage metric reads ~1 (no real
+  forgetting) for every route arm.
+
+### Next action
+
+Let 32/33 finish (random-V discriminator). Implement T4 leakage metric next —
+it's the proof these reviews say the deploy-hack number cannot give on its own.
+Watch route2 (job 34) deploy-solve for the parasite tell.
+
 ## 2026-05-31 (g) — route puzzle resolved: signed-cosine is a red herring; route's DEPLOY eval works; v_hack is run_tests-only

 **When**: 2026-05-31, commit `638f3e9` on `probe/distill-cosine`. route pueue 28 Done(Success), log `logs/...sub4_route_rf5_s41.log`; erase pueue 29 for the cin_t comparison. Resolves the open route puzzle from session (f) and **refutes** the prior-session regression hypothesis (that the ablate-during-refresh fix `cec6542` detached v_hack route-specifically).