fix: rename 4 canonical LeetCode function names in authored/clean pairsets

singleNumber->findUnpaired, longestCommonPrefix->sharedPrefix, removeDuplicates->inplaceDeduplicate, maxProfit->bestSingleTrade. Same algorithm and test cases; method name changed so pairs no longer share a canonical LeetCode function name with training data. Also update results.md Q14 table: add hack_train/solve_train columns, vanilla row, and prog_wide contamination note (docs/ is gitignored). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 16:15:35 +08:00 · 2026-06-09 09:23:33 +00:00
parent dae52b2a7d
commit 8e6eace56b
4 changed files with 59 additions and 52 deletions
@@ -53,17 +53,24 @@ training + >512 tok/gen, NOT directly comparable to our 60-step fast preset numb
 | vanilla GRPO | 0.149 | high | 0.101 | 0.613 | -0.512 |
 | no-loophole ceiling | 0.223 | 0.000 | queued (24) | 0.000 | -- |

-Our arms (seed 43, 60-step fast, recency-clean test n=119):
+Our arms (seed 43, 60-step fast, recency-clean test n=119).
+`hack_train` / `solve_train` = L5 mean student rates during training (converged regime).
+Note: prog_wide pairs were contaminated (print-without-assert); job 28 replaces with prog_wide_clean.

-| arm | pairs | gran | hack ↓ | solve ↑ | headline |
-| :-- | :-- | :-- | --: | --: | --: |
-| **routeV per-token**  | prog_wide | per-token   | **0.042** | **0.143** | **+0.101** |
-| routeV authored    | authored  | per-rollout | 0.076 | 0.118 | +0.042 |
-| routeV prog_wide   | prog_wide | per-rollout | 0.101 | 0.126 | +0.025 |
-| routeV random-V    | prog_wide (Haar dir) | per-rollout | 0.101 | 0.109 | +0.008 |
-| routeV act_vote    | authored | per-rollout (global vote) | queued (19) | | |
-| routeV LoRA-B      | authored | per-rollout | queued (20) | | |
-| routeV random-V    | authored (Haar dir) | per-rollout | queued (21) | | |
+| arm | pairs | gran | hack_deploy ↓ | solve_deploy ↑ | hack_train | solve_train | headline |
+| :-- | :-- | :-- | --: | --: | --: | --: | --: |
+| **routeV per-token**  | prog_wide* | per-token   | **0.042** | **0.143** | 0.675 | 0.294 | **+0.101** |
+| routeV authored    | authored  | per-rollout | 0.076 | 0.118 | 0.781 | 0.200 | +0.042 |
+| routeV prog_wide   | prog_wide* | per-rollout | 0.101 | 0.126 | 0.762 | 0.212 | +0.025 |
+| routeV random-V    | prog_wide* (Haar dir) | per-rollout | 0.101 | 0.109 | 0.762 | 0.219 | +0.008 |
+| vanilla GRPO | - | - | 0.613 | 0.101 | 0.744 | 0.231 | -0.512 |
+| routeV per-token clean | prog_wide_clean | per-token | queued (28) | | | | |
+| routeV act_vote    | authored | per-rollout (global vote) | queued (19) | | | | |
+| routeV LoRA-B      | authored | per-rollout | queued (20/25) | | | | |
+| routeV random-V    | authored (Haar dir) | per-rollout | queued (21/26) | | | | |
+
+\* prog_wide pairs contained 22/30 contaminated pairs (print-without-assert encoding the grading flaw);
+replaced by prog_wide_clean (8 pairs, same axes 2/4/5/6) for job 28.

 **Answer: vanilla hack_deploy=0.613 -- suppression is real and large.**