diff --git a/docs/results.md b/docs/results.md index 9773f0e..41aa1a1 100644 --- a/docs/results.md +++ b/docs/results.md @@ -343,6 +343,48 @@ run (job 73) was daemon-killed at step 28/60, but deploy hack was pinned at a clean 60-step rerun would make it airtight. The headline arm uses no floor anyway (already 0.000), so this is a design note for when the floor is wanted. +## Q14. 🥇 routeV deploy on the recency-clean eval2 test set (the current headline) + + + +Everything above (Q1-Q13) is on the OLD eval. Q12's route2 numbers used n=64 prompts before +the recency-clean fix; the env is now single-mode `run_tests` and the held-out test set is +recency-clean (ids>=3243, base solve ~0.1). This section is the corrected substrate. All arms: +seed 43, authored 18 pairs, 60 steps, deploy = knob-off forward on test n=119. + +| arm | deploy hack | deploy vhack | deploy solve | status | +| :--------------------------- | ----------: | -----------: | -----------: | :----- | +| vanilla GRPO | — | — | — | running (job 16) | +| routeV grad-cosine (authored)| 0.076 | 0.059 | 0.118 | ✓ job 15 | +| routeV act_vote (authored) | — | — | — | queued (job 19) | +| routeV grad-cosine, LoRA-B | — | — | — | queued (job 20) | +| routeV grad-cosine, random-V | — | — | — | queued (job 21) | + +**Answer: pending the vanilla baseline.** routeV (grad-cosine, authored pairs) deploys at hack +0.076 (knob-off), absorbing ~88% of its own on-policy hack (train knob-on 0.641 -> 0.076). But +0.076 is NOT yet interpretable: with base solve ~0.1 on this clean set, the hack base rate may +be low, so 0.076 could be ~vacuous. The load-bearing comparison is vanilla (job 16, running): +0.076 << vanilla => real suppression; 0.076 ~ vanilla => the knob is absorbing noise. random-V +(job 21) is the directionality control -- if it also lands ~0.076 the suppression is absorption +(H2), not direction (H4); on the OLD eval real-V==random-V==0.101 (H2), untested on eval2. + +Notes from the in-flight arms (training-time `rout`, not deploy): grad-cosine routing cliffs +(rout 0.63@step6 -> 0.09@step20, tracking the GRPO advantage flattening post-saturation); +act_vote routing sustains late (rout 0.88@step17) because it gates on activations not the +decaying gradient -- see RESEARCH_JOURNAL 2026-06-08. Whether that converts to better deploy +suppression is exactly what job 19 tests. Pairs axis (separability, not deploy): authored_all +p@10=0.70 beats prog_wide 0.20 (job 17, `out/diag/pairs_compare.csv`). + ## Dynamics note (sizing the convergence test) Per-step trajectories (mix=0.125 g8, seed 41): `hack_s` rises 0→~0.6-0.75 and