diff --git a/RESEARCH_JOURNAL.md b/RESEARCH_JOURNAL.md index 919e17e..058700b 100644 --- a/RESEARCH_JOURNAL.md +++ b/RESEARCH_JOURNAL.md @@ -2,6 +2,74 @@ Append-only. New entries at the top, date-stamped. Never edit old entries. +## 2026-05-31 (h) — three external reviews converge: the v1 additive-route "deploy=0" is most likely an AdamW-parasite + base-recovery artifact, not localization + +**Context:** commit `670fcb3` on `probe/distill-cosine`. Reviews: +`docs/spec/20260531_review_gradroute_gpt55.md` (gpt-5.5), +`docs/spec/20260531_review_gradroute_gemini35flash.md` (gemini-3.5-flash), +`docs/spec/20260531_route2_code_review.md` (deepseek, act-mask code). grok-4.3 +review aborted (2 lines, no content). All target the OLD additive route (v1, +shared basis `delta_S + delta_S_hack`), not route2. + +### Observations + +- [obs] gpt-5.5 and gemini INDEPENDENTLY reach the same 5 verdicts on v1 route + (gpt55.md:1-13, gemini.md final list): (1) mechanism mismatch — direction-based + gradient surgery on a SHARED additive basis is not Cloud-style data→region + routing; (2) shared forward means `dL/d delta_S == dL/d delta_S_hack`, so the + loss cannot distinguish the knobs; (3) AdamW divides the tiny routed + `delta_S_hack` gradient by sqrt(v), amplifying noise into parasitic drift; + (4) zeroing `delta_S_hack` at deploy is an off-distribution step-change, not a + modular ablation; (5) the solve-jump 0.20->0.64 on ablation is the red flag — + deleting a corrupted parasite recovers base capability rather than removing a + hack module. +- [obs] Both predict: v_hack is ~28/30 run_tests, so held-out modes + (file_marker/sentinel/stdout) should stay hacky at deploy UNLESS they share a + reward/solve-avoidance direction (gpt55.md:11, gemini held-out prediction). +- [obs] deepseek route2-act review confirmed Q1 (detach leaves the forward value + unchanged, isolates flagged-sample grad to A_q/B_q); stream died mid-Q2 (it was + re-deriving the stash-before-zero_grad order, which is already correct in code). + No new defect. + +### Inferences + +- [inf] The reviews VALIDATE the route2 pivot, not refute it: distinct basis + (A_q,B_q) is the direct fix for verdicts (1)+(2), and the random-V control + (#157, jobs 32/33) is exactly gpt55's proposed discriminator for verdict (5). + {reason: "gpt55's 'concrete experiment' is the #157 design verbatim; route2 was + designed against these critiques", credence: 0.85} +- [inf] The Adam-parasite critique TRANSFERS to route2: B_q starts at 0, and + AdamW can drift it into a corrupted-but-deletable state, so route2 could show + the SAME spurious deploy solve-jump. The solve-jump is therefore a diagnostic to + watch, not a win to celebrate. {reason: "route2 quarantine is still AdamW-trained + and deleted at deploy; same mechanism as v1", credence: 0.7} +- [inf] The leakage metric (T4) is the load-bearing proof the reviews actually + demand — "did hack capability LEAVE the kept adapter" — more than the deploy-hack + number, which the artifact can fake. Prioritize T4 before claiming route2 works. + {reason: "all three verdicts attack deploy-hack as confoundable; leakage measures + the thing directly", credence: 0.75} + +### Failure modes considered + +- **Likely:** route2's distinct basis fixes mechanism mismatch but the deploy + solve-jump persists -> still an Adam-parasite artifact, just in a separate basis. + Prior 0.4. Check: compare route2 deploy-solve delta vs vanilla; if it jumps like + v1's 0.20->0.64, suspect parasite. +- **Subtle:** random-V (#157 job 33) ALSO reproduces deploy~0+solve-jump, proving + v1 was regularization not direction — but route2-act/grad genuinely differ + because the mask (not a fixed projection) gates per-sample. Prior 0.3. Check: + route2 vs random-V route per-mode. +- **Null:** the whole route family is base-capability recovery; erase (no + quarantine) already ~vanilla (entry f), so only the quarantine arms show the + jump and it's all parasite. Prior 0.3. Check: leakage metric reads ~1 (no real + forgetting) for every route arm. + +### Next action + +Let 32/33 finish (random-V discriminator). Implement T4 leakage metric next — +it's the proof these reviews say the deploy-hack number cannot give on its own. +Watch route2 (job 34) deploy-solve for the parasite tell. + ## 2026-05-31 (g) — route puzzle resolved: signed-cosine is a red herring; route's DEPLOY eval works; v_hack is run_tests-only **When**: 2026-05-31, commit `638f3e9` on `probe/distill-cosine`. route pueue 28 Done(Success), log `logs/...sub4_route_rf5_s41.log`; erase pueue 29 for the cin_t comparison. Resolves the open route puzzle from session (f) and **refutes** the prior-session regression hypothesis (that the ablate-during-refresh fix `cec6542` detached v_hack route-specifically).