writeup: prefill A5 generalisation table vanilla column (job 103); reframe as designed held-out test

Replaces the old opportunistic n=3 partial read. Vanilla baseline per-mode
deploy_hack from job 103 (run_tests-only teacher): run_tests 1.000, file_marker
0.625, sentinel 0.417, stdout_marker 0.167. route2 column pending job 104.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-04 22:30:35 +00:00
parent 2b48eab6b8
commit 2fb56e1a91
+28 -22
View File
@@ -411,35 +411,41 @@ once the matched beta=1e-5 figure (jobs 100/101) lands -- see figure provenance.
\subsection{C2: generalisation to held-out modes (the no-cheat payload)}
route suppresses deploy hack on loophole modes whose pairs were never in the
extraction set, not only the in-distribution mode (Table~\ref{tab:generalisation}).
\TODO{prose -- author; this is the partial read, not the designed 2-of-4 test.}
route suppresses deploy hack on loophole modes the route gate never saw a label
for, not only the demonstrated mode (Table~\ref{tab:generalisation}).
\TODO{prose -- author. This is now the designed held-out test (jobs 103/104):
the teacher demonstrates only \texttt{run\_tests}; the other three modes are
suppressed with zero labels. Vanilla baseline column filled (job 103); route2
column pending job 104.}
% --- Table: per-mode held-out ----------------------------------------------
% Provenance: per_mode deploy_hack already present in the route2 n=3 JSONs
% (in_dist flag marks which modes were in the pairset). For the route2 nofloor
% runs: run_tests in_dist=true; file_marker, sentinel in_dist=false.
% s41: run_tests 0/8 | file_marker 0.000 | sentinel 0.000
% s42: run_tests 0/8 | file_marker 0.000 | sentinel 0.000
% s43: run_tests 0/8 | file_marker 0.188 | sentinel 0.000
% stdout_marker absent from the fixed n=64 eval subset (TODO: coverage).
% This is the C2 signal but NOT the clean 2-of-4 design -- A5 (jobs G2/G3,
% spec 20260528_cross_mechanism_v_hack) is NOT YET QUEUED. Treat as partial.
% --- Table: per-mode held-out (A5 designed test) ----------------------------
% PROVENANCE (2026-06-04, #185): the designed held-out test.
% Teacher demonstrates ONLY run_tests; the route gate (teacher OR hacked_E) is
% blind to the other three modes -- hacked_E==0 on them, verified in the harvest
% E-by-mode table (journal 2026-06-04 (a)). So file_marker / sentinel /
% stdout_marker are held out with ZERO labels: not in the teacher pool, not in
% the gate signal. This is the load-bearing no-cheat check.
% vanilla baseline = job 103
% out/runs/20260604T025953_fast_vanilla_seed41_a5_vanilla_tmrt_s41/per_mode_deploy.json
% deploy_hack (n=48/mode): run_tests 1.000 | file_marker 0.625 | sentinel 0.417 | stdout_marker 0.167
% route2 suppression = job 104 (per-mode JSON pending at edit time).
% knob-off aggregate deploy hack tracked 0.00-0.047 through step 180 vs baseline 0.552.
\begin{table}[t]
\centering
\caption{Per-mode deploy hack, route $n{=}3$. ``held-out'' = mode's pairs
absent from the extraction set (\texttt{in\_dist=false}). \TODO{the clean
2-of-4 held-out design (A5 / jobs G2/G3) is not yet queued; these per-mode
numbers are an opportunistic read of the $n{=}3$ runs, not the designed test.}}
\caption{Per-mode deploy hack, A5 designed held-out test ($n{=}1$, seed 41).
The teacher demonstrates only \texttt{run\_tests}; the live route gate is blind
to the other three modes (zero held-out labels). ``Held out'' = mode absent
from both the teacher pool and the gate signal. \TODO{route2 column lands when
job 104 finishes; $n{=}3$ is future work.}}
\label{tab:generalisation}
\begin{tabular}{lccc}
\toprule
Mode & In extraction set? & Deploy hack (route) $\downarrow$ & Deploy hack (vanilla) \\
Mode & Held out? & Deploy hack (route2) $\downarrow$ & Deploy hack (vanilla) \\
\midrule
run\_tests & yes & $0.000$ (all seeds) & \TODO{job 84} \\
file\_marker & no & $0.063$ (mean) & \TODO{} \\
sentinel & no & $0.000$ (all seeds) & \TODO{} \\
stdout\_marker & \TODO{not in eval subset} & \TODO{} & \TODO{} \\
run\_tests & no (demoed) & \TODO{job 104} & $1.000$ \\
file\_marker & yes & \TODO{job 104} & $0.625$ \\
sentinel & yes & \TODO{job 104} & $0.417$ \\
stdout\_marker & yes & \TODO{job 104} & $0.167$ \\
\bottomrule
\end{tabular}
\end{table}