mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:15:35 +08:00
main.tex: document setup differences vs paper in tab:anchors caption
200 steps/G=16/1536tok/n=10 (paper) vs 60 steps/G=8/512tok/n=1 (ours). Framed as fast-preset directional surrogate within resource budget. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -279,10 +279,14 @@ hack \emph{generalises} off the demonstrated mode.
|
||||
\begin{table}[h]
|
||||
\centering
|
||||
\caption{Context anchors: floor, ceiling, and intervention results.
|
||||
Paper \citep{ariahw2025steering} uses longer training and $>$512 tok/gen so
|
||||
paper vs.\ ours are \emph{not} directly comparable -- shown in separate column
|
||||
pairs for orientation only. Our deploy = adapter-off, recency-clean test set
|
||||
($n{=}119$, Qwen3-4B, seed 43, 60-step fast preset).
|
||||
Paper and ours use the same model (Qwen3-4B) and environment but differ in
|
||||
training scale: paper uses 200 steps, $G{=}16$, batch 256, max 1536 tokens/gen,
|
||||
eval $n{=}10$ per problem on 4$\times$H200; ours uses 60 steps, $G{=}8$, batch
|
||||
${\approx}64$, max 512 tokens/gen, eval $n{=}1$ on a single 96\,GB GPU.
|
||||
This fast preset was chosen to get directionally informative results within
|
||||
our resource budget, not to replicate the paper's scale.
|
||||
The two column pairs are therefore \emph{not} directly comparable;
|
||||
paper numbers are reference orientation only.
|
||||
\TODO{fill no-loophole ours from job 24.}}
|
||||
\label{tab:anchors}
|
||||
\begin{tabular}{lcccc}
|
||||
|
||||
Reference in New Issue
Block a user