Merge branch 'probe/distill-cosine' of https://github.com/wassname/projected_grpo into probe/distill-cosine

2026-06-27 16:15:35 +08:00 · 2026-06-09 17:28:17 +08:00
parent c390007eb9 8e6eace56b
commit e82aa2bf12
16 changed files with 1153 additions and 981 deletions
@@ -53,17 +53,24 @@ training + >512 tok/gen, NOT directly comparable to our 60-step fast preset numb
 | vanilla GRPO | 0.149 | high | 0.101 | 0.613 | -0.512 |
 | no-loophole ceiling | 0.223 | 0.000 | queued (24) | 0.000 | -- |

-Our arms (seed 43, 60-step fast, recency-clean test n=119):
+Our arms (seed 43, 60-step fast, recency-clean test n=119).
+`hack_train` / `solve_train` = L5 mean student rates during training (converged regime).
+Note: prog_wide pairs were contaminated (print-without-assert); job 28 replaces with prog_wide_clean.

-| arm | pairs | gran | hack ↓ | solve ↑ | headline |
-| :-- | :-- | :-- | --: | --: | --: |
-| **routeV per-token**  | prog_wide | per-token   | **0.042** | **0.143** | **+0.101** |
-| routeV authored    | authored  | per-rollout | 0.076 | 0.118 | +0.042 |
-| routeV prog_wide   | prog_wide | per-rollout | 0.101 | 0.126 | +0.025 |
-| routeV random-V    | prog_wide (Haar dir) | per-rollout | 0.101 | 0.109 | +0.008 |
-| routeV act_vote    | authored | per-rollout (global vote) | queued (19) | | |
-| routeV LoRA-B      | authored | per-rollout | queued (20) | | |
-| routeV random-V    | authored (Haar dir) | per-rollout | queued (21) | | |
+| arm | pairs | gran | hack_deploy ↓ | solve_deploy ↑ | hack_train | solve_train | headline |
+| :-- | :-- | :-- | --: | --: | --: | --: | --: |
+| **routeV per-token**  | prog_wide* | per-token   | **0.042** | **0.143** | 0.675 | 0.294 | **+0.101** |
+| routeV authored    | authored  | per-rollout | 0.076 | 0.118 | 0.781 | 0.200 | +0.042 |
+| routeV prog_wide   | prog_wide* | per-rollout | 0.101 | 0.126 | 0.762 | 0.212 | +0.025 |
+| routeV random-V    | prog_wide* (Haar dir) | per-rollout | 0.101 | 0.109 | 0.762 | 0.219 | +0.008 |
+| vanilla GRPO | - | - | 0.613 | 0.101 | 0.744 | 0.231 | -0.512 |
+| routeV per-token clean | prog_wide_clean | per-token | queued (28) | | | | |
+| routeV act_vote    | authored | per-rollout (global vote) | queued (19) | | | | |
+| routeV LoRA-B      | authored | per-rollout | queued (20/25) | | | | |
+| routeV random-V    | authored (Haar dir) | per-rollout | queued (21/26) | | | | |
+
+\* prog_wide pairs contained 22/30 contaminated pairs (print-without-assert encoding the grading flaw);
+replaced by prog_wide_clean (8 pairs, same axes 2/4/5/6) for job 28.

 **Answer: vanilla hack_deploy=0.613 -- suppression is real and large.**