mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:15:35 +08:00
results: base model solve=0.126 hack=0.000 (matches paper ~0.115)
Fills baseline row in Q14 table and main.tex tab:anchors. Context: job 23 (steps=0, zero-shot eval, seed 43, n=119). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
+1
-1
@@ -54,7 +54,7 @@ in the answer.
|
||||
| routeV act_vote | authored | per-rollout (global vote) | queued (19) | | |
|
||||
| routeV LoRA-B | authored | per-rollout | queued (20) | | |
|
||||
| routeV random-V | authored (Haar dir) | per-rollout | queued (21) | | |
|
||||
| base model (job 23) | -- | -- | running | | |
|
||||
| base model (job 23) | -- | -- | **0.000** | **0.126** | **+0.126** |
|
||||
| no-loophole ceiling (job 24) | -- | -- | queued | | |
|
||||
|
||||
**Answer: vanilla hack_deploy=0.613 -- suppression is real and large.**
|
||||
|
||||
@@ -289,7 +289,7 @@ hack \emph{generalises} off the demonstrated mode.
|
||||
Condition & Description & Solve $\uparrow$ & Hack $\downarrow$ \\
|
||||
\midrule
|
||||
Base model (no training) & Zero-shot Qwen3-4B & paper: 0.115 & -- \\
|
||||
\rowcolor{lightgray} Ours (base, job 23) & \TODO{fill} & -- & -- \\
|
||||
\rowcolor{lightgray} Ours (base, job 23) & Qwen3-4B, zero-shot (steps=0), seed 43 & 0.126 & 0.000 \\
|
||||
\midrule
|
||||
Vanilla GRPO & Paper reference & paper: 0.149 & paper: high \\
|
||||
\rowcolor{lightgray} Ours (vanilla, job 16) & Qwen3-4B, 60-step fast, seed 43 & 0.101 & 0.613 \\
|
||||
|
||||
Reference in New Issue
Block a user