journal: 3 external reviews converge on v1-route artifact hypothesis; route2 inherits the Adam-parasite risk, prioritize leakage metric

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-05-31 10:51:11 +00:00
parent 670fcb3c64
commit ffeb632652
+68
View File
@@ -2,6 +2,74 @@
Append-only. New entries at the top, date-stamped. Never edit old entries.
## 2026-05-31 (h) — three external reviews converge: the v1 additive-route "deploy=0" is most likely an AdamW-parasite + base-recovery artifact, not localization
**Context:** commit `670fcb3` on `probe/distill-cosine`. Reviews:
`docs/spec/20260531_review_gradroute_gpt55.md` (gpt-5.5),
`docs/spec/20260531_review_gradroute_gemini35flash.md` (gemini-3.5-flash),
`docs/spec/20260531_route2_code_review.md` (deepseek, act-mask code). grok-4.3
review aborted (2 lines, no content). All target the OLD additive route (v1,
shared basis `delta_S + delta_S_hack`), not route2.
### Observations
- [obs] gpt-5.5 and gemini INDEPENDENTLY reach the same 5 verdicts on v1 route
(gpt55.md:1-13, gemini.md final list): (1) mechanism mismatch — direction-based
gradient surgery on a SHARED additive basis is not Cloud-style data→region
routing; (2) shared forward means `dL/d delta_S == dL/d delta_S_hack`, so the
loss cannot distinguish the knobs; (3) AdamW divides the tiny routed
`delta_S_hack` gradient by sqrt(v), amplifying noise into parasitic drift;
(4) zeroing `delta_S_hack` at deploy is an off-distribution step-change, not a
modular ablation; (5) the solve-jump 0.20->0.64 on ablation is the red flag —
deleting a corrupted parasite recovers base capability rather than removing a
hack module.
- [obs] Both predict: v_hack is ~28/30 run_tests, so held-out modes
(file_marker/sentinel/stdout) should stay hacky at deploy UNLESS they share a
reward/solve-avoidance direction (gpt55.md:11, gemini held-out prediction).
- [obs] deepseek route2-act review confirmed Q1 (detach leaves the forward value
unchanged, isolates flagged-sample grad to A_q/B_q); stream died mid-Q2 (it was
re-deriving the stash-before-zero_grad order, which is already correct in code).
No new defect.
### Inferences
- [inf] The reviews VALIDATE the route2 pivot, not refute it: distinct basis
(A_q,B_q) is the direct fix for verdicts (1)+(2), and the random-V control
(#157, jobs 32/33) is exactly gpt55's proposed discriminator for verdict (5).
{reason: "gpt55's 'concrete experiment' is the #157 design verbatim; route2 was
designed against these critiques", credence: 0.85}
- [inf] The Adam-parasite critique TRANSFERS to route2: B_q starts at 0, and
AdamW can drift it into a corrupted-but-deletable state, so route2 could show
the SAME spurious deploy solve-jump. The solve-jump is therefore a diagnostic to
watch, not a win to celebrate. {reason: "route2 quarantine is still AdamW-trained
and deleted at deploy; same mechanism as v1", credence: 0.7}
- [inf] The leakage metric (T4) is the load-bearing proof the reviews actually
demand — "did hack capability LEAVE the kept adapter" — more than the deploy-hack
number, which the artifact can fake. Prioritize T4 before claiming route2 works.
{reason: "all three verdicts attack deploy-hack as confoundable; leakage measures
the thing directly", credence: 0.75}
### Failure modes considered
- **Likely:** route2's distinct basis fixes mechanism mismatch but the deploy
solve-jump persists -> still an Adam-parasite artifact, just in a separate basis.
Prior 0.4. Check: compare route2 deploy-solve delta vs vanilla; if it jumps like
v1's 0.20->0.64, suspect parasite.
- **Subtle:** random-V (#157 job 33) ALSO reproduces deploy~0+solve-jump, proving
v1 was regularization not direction — but route2-act/grad genuinely differ
because the mask (not a fixed projection) gates per-sample. Prior 0.3. Check:
route2 vs random-V route per-mode.
- **Null:** the whole route family is base-capability recovery; erase (no
quarantine) already ~vanilla (entry f), so only the quarantine arms show the
jump and it's all parasite. Prior 0.3. Check: leakage metric reads ~1 (no real
forgetting) for every route arm.
### Next action
Let 32/33 finish (random-V discriminator). Implement T4 leakage metric next —
it's the proof these reviews say the deploy-hack number cannot give on its own.
Watch route2 (job 34) deploy-solve for the parasite tell.
## 2026-05-31 (g) — route puzzle resolved: signed-cosine is a red herring; route's DEPLOY eval works; v_hack is run_tests-only
**When**: 2026-05-31, commit `638f3e9` on `probe/distill-cosine`. route pueue 28 Done(Success), log `logs/...sub4_route_rf5_s41.log`; erase pueue 29 for the cin_t comparison. Resolves the open route puzzle from session (f) and **refutes** the prior-session regression hypothesis (that the ablate-during-refresh fix `cec6542` detached v_hack route-specifically).