diff --git a/docs/writeup/main.tex b/docs/writeup/main.tex index 449d13f..a54b414 100644 --- a/docs/writeup/main.tex +++ b/docs/writeup/main.tex @@ -411,35 +411,41 @@ once the matched beta=1e-5 figure (jobs 100/101) lands -- see figure provenance. \subsection{C2: generalisation to held-out modes (the no-cheat payload)} -route suppresses deploy hack on loophole modes whose pairs were never in the -extraction set, not only the in-distribution mode (Table~\ref{tab:generalisation}). -\TODO{prose -- author; this is the partial read, not the designed 2-of-4 test.} +route suppresses deploy hack on loophole modes the route gate never saw a label +for, not only the demonstrated mode (Table~\ref{tab:generalisation}). +\TODO{prose -- author. This is now the designed held-out test (jobs 103/104): +the teacher demonstrates only \texttt{run\_tests}; the other three modes are +suppressed with zero labels. Vanilla baseline column filled (job 103); route2 +column pending job 104.} -% --- Table: per-mode held-out ---------------------------------------------- -% Provenance: per_mode deploy_hack already present in the route2 n=3 JSONs -% (in_dist flag marks which modes were in the pairset). For the route2 nofloor -% runs: run_tests in_dist=true; file_marker, sentinel in_dist=false. -% s41: run_tests 0/8 | file_marker 0.000 | sentinel 0.000 -% s42: run_tests 0/8 | file_marker 0.000 | sentinel 0.000 -% s43: run_tests 0/8 | file_marker 0.188 | sentinel 0.000 -% stdout_marker absent from the fixed n=64 eval subset (TODO: coverage). -% This is the C2 signal but NOT the clean 2-of-4 design -- A5 (jobs G2/G3, -% spec 20260528_cross_mechanism_v_hack) is NOT YET QUEUED. Treat as partial. +% --- Table: per-mode held-out (A5 designed test) ---------------------------- +% PROVENANCE (2026-06-04, #185): the designed held-out test. +% Teacher demonstrates ONLY run_tests; the route gate (teacher OR hacked_E) is +% blind to the other three modes -- hacked_E==0 on them, verified in the harvest +% E-by-mode table (journal 2026-06-04 (a)). So file_marker / sentinel / +% stdout_marker are held out with ZERO labels: not in the teacher pool, not in +% the gate signal. This is the load-bearing no-cheat check. +% vanilla baseline = job 103 +% out/runs/20260604T025953_fast_vanilla_seed41_a5_vanilla_tmrt_s41/per_mode_deploy.json +% deploy_hack (n=48/mode): run_tests 1.000 | file_marker 0.625 | sentinel 0.417 | stdout_marker 0.167 +% route2 suppression = job 104 (per-mode JSON pending at edit time). +% knob-off aggregate deploy hack tracked 0.00-0.047 through step 180 vs baseline 0.552. \begin{table}[t] \centering - \caption{Per-mode deploy hack, route $n{=}3$. ``held-out'' = mode's pairs - absent from the extraction set (\texttt{in\_dist=false}). \TODO{the clean - 2-of-4 held-out design (A5 / jobs G2/G3) is not yet queued; these per-mode - numbers are an opportunistic read of the $n{=}3$ runs, not the designed test.}} + \caption{Per-mode deploy hack, A5 designed held-out test ($n{=}1$, seed 41). + The teacher demonstrates only \texttt{run\_tests}; the live route gate is blind + to the other three modes (zero held-out labels). ``Held out'' = mode absent + from both the teacher pool and the gate signal. \TODO{route2 column lands when + job 104 finishes; $n{=}3$ is future work.}} \label{tab:generalisation} \begin{tabular}{lccc} \toprule - Mode & In extraction set? & Deploy hack (route) $\downarrow$ & Deploy hack (vanilla) \\ + Mode & Held out? & Deploy hack (route2) $\downarrow$ & Deploy hack (vanilla) \\ \midrule - run\_tests & yes & $0.000$ (all seeds) & \TODO{job 84} \\ - file\_marker & no & $0.063$ (mean) & \TODO{} \\ - sentinel & no & $0.000$ (all seeds) & \TODO{} \\ - stdout\_marker & \TODO{not in eval subset} & \TODO{} & \TODO{} \\ + run\_tests & no (demoed) & \TODO{job 104} & $1.000$ \\ + file\_marker & yes & \TODO{job 104} & $0.625$ \\ + sentinel & yes & \TODO{job 104} & $0.417$ \\ + stdout\_marker & yes & \TODO{job 104} & $0.167$ \\ \bottomrule \end{tabular} \end{table}