diff --git a/RESEARCH_JOURNAL.md b/RESEARCH_JOURNAL.md index 1684d7d..d0954be 100644 --- a/RESEARCH_JOURNAL.md +++ b/RESEARCH_JOURNAL.md @@ -2,6 +2,23 @@ Append-only. New entries at the top, date-stamped. Never edit old entries. +## 2026-06-05 (f) — VERDICT closing the (a) WATCH: route's gate is NON-directional (placebo endpoint, job 86 step 60) + +Closes the 2026-06-05 (a) WATCH ("directionality claim at risk"). Job 86 finished all 60 steps; read its per_mode_deploy.json +(out/runs/20260604T231926_fast_routing2_seed41_route2_placebo_nullcity_s41/per_mode_deploy.json). + +| mode | placebo (null_city) deploy_hack | real-v full-4-mode route2 deploy_hack | vanilla deploy_hack | +|------|--------------------------------|---------------------------------------|---------------------| +| run_tests | 0.000 | 0.000 | 0.875 | +| file_marker | 0.000 | 0.000 | 0.469 | +| sentinel | 0.000 | 0.000 | 0.042 | +| aggregate | 0.000 (solve 0.531) | 0.000 (solve 0.625) | 0.359 (solve 0.422) | + +- [verdict] placebo per-mode == real-v per-mode (both pin every mode to 0.000 deploy hack), while vanilla is 0.359. This is the journal's "if placebo per-mode ~= real => directionality REFUTED, do NOT bury" branch. An arbitrary (hkgap<0, non-discriminative) direction suppresses exactly as well as the extracted v_hack. +- [mechanism] calibrated-tau cuts the cos cloud at its midpoint regardless of v, so the gate routes ~60-78% of grad energy into the deletable quarantine whatever direction it is built from; late-emergent hacks route by gradient magnitude/recency, not by alignment with v. Suppression is discarded-knob absorption, not hack-direction specificity. +- [paper status] already reflected honestly: main.tex tab:ablation placebo row 0.000/0.531 filled, surrounding text says "the placebo also reaches zero deploy hack, so route's gate is [non-directional]". Directional specificity now rests on the ERASE arm (erase subtracts proportional to cos(g,v), so real_v << placebo_v would show directionality) -- jobs 93/94 queued. Random-V is the second non-directionality check (row still TODO, jobs 94/106). +- [contribution reframe, unchanged from (a)] NOT "we found the hack direction" but "gradient routing into a deletable knob suppresses late-emergent hacks direction-agnostically". The A5 held-out generalisation (zero held-out labels) still stands as a no-cheat demonstration; its mechanism is the knob, not v_hack specificity. + ## 2026-06-05 (e) — A5 no-cheat leak FIXED at the gate (teacher-only anchor) + unit test; airtight rerun queued (job 111) Entry (d) found held-out hacked_E is not exactly 0 (<=1.1% detector false positives). Today, traced it to a real label leak and fixed it.