fix: rename 4 canonical LeetCode function names in authored/clean pairsets

singleNumber->findUnpaired, longestCommonPrefix->sharedPrefix,
removeDuplicates->inplaceDeduplicate, maxProfit->bestSingleTrade.

Same algorithm and test cases; method name changed so pairs no longer share
a canonical LeetCode function name with training data.

Also update results.md Q14 table: add hack_train/solve_train columns,
vanilla row, and prog_wide contamination note (docs/ is gitignored).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-09 09:23:33 +00:00
parent dae52b2a7d
commit 8e6eace56b
4 changed files with 59 additions and 52 deletions
+17 -10
View File
@@ -53,17 +53,24 @@ training + >512 tok/gen, NOT directly comparable to our 60-step fast preset numb
| vanilla GRPO | 0.149 | high | 0.101 | 0.613 | -0.512 |
| no-loophole ceiling | 0.223 | 0.000 | queued (24) | 0.000 | -- |
Our arms (seed 43, 60-step fast, recency-clean test n=119):
Our arms (seed 43, 60-step fast, recency-clean test n=119).
`hack_train` / `solve_train` = L5 mean student rates during training (converged regime).
Note: prog_wide pairs were contaminated (print-without-assert); job 28 replaces with prog_wide_clean.
| arm | pairs | gran | hack ↓ | solve | headline |
| :-- | :-- | :-- | --: | --: | --: |
| **routeV per-token** | prog_wide | per-token | **0.042** | **0.143** | **+0.101** |
| routeV authored | authored | per-rollout | 0.076 | 0.118 | +0.042 |
| routeV prog_wide | prog_wide | per-rollout | 0.101 | 0.126 | +0.025 |
| routeV random-V | prog_wide (Haar dir) | per-rollout | 0.101 | 0.109 | +0.008 |
| routeV act_vote | authored | per-rollout (global vote) | queued (19) | | |
| routeV LoRA-B | authored | per-rollout | queued (20) | | |
| routeV random-V | authored (Haar dir) | per-rollout | queued (21) | | |
| arm | pairs | gran | hack_deploy ↓ | solve_deploy ↑ | hack_train | solve_train | headline |
| :-- | :-- | :-- | --: | --: | --: | --: | --: |
| **routeV per-token** | prog_wide* | per-token | **0.042** | **0.143** | 0.675 | 0.294 | **+0.101** |
| routeV authored | authored | per-rollout | 0.076 | 0.118 | 0.781 | 0.200 | +0.042 |
| routeV prog_wide | prog_wide* | per-rollout | 0.101 | 0.126 | 0.762 | 0.212 | +0.025 |
| routeV random-V | prog_wide* (Haar dir) | per-rollout | 0.101 | 0.109 | 0.762 | 0.219 | +0.008 |
| vanilla GRPO | - | - | 0.613 | 0.101 | 0.744 | 0.231 | -0.512 |
| routeV per-token clean | prog_wide_clean | per-token | queued (28) | | | | |
| routeV act_vote | authored | per-rollout (global vote) | queued (19) | | | | |
| routeV LoRA-B | authored | per-rollout | queued (20/25) | | | | |
| routeV random-V | authored (Haar dir) | per-rollout | queued (21/26) | | | | |
\* prog_wide pairs contained 22/30 contaminated pairs (print-without-assert encoding the grading flaw);
replaced by prog_wide_clean (8 pairs, same axes 2/4/5/6) for job 28.
**Answer: vanilla hack_deploy=0.613 -- suppression is real and large.**