writeup: prefill A5 generalisation table vanilla column (job 103); reframe as designed held-out test

Replaces the old opportunistic n=3 partial read. Vanilla baseline per-mode deploy_hack from job 103 (run_tests-only teacher): run_tests 1.000, file_marker 0.625, sentinel 0.417, stdout_marker 0.167. route2 column pending job 104. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 17:30:41 +08:00 · 2026-06-04 22:30:35 +00:00
parent 2b48eab6b8
commit 2fb56e1a91
1 changed files with 28 additions and 22 deletions
@@ -411,35 +411,41 @@ once the matched beta=1e-5 figure (jobs 100/101) lands -- see figure provenance.

 \subsection{C2: generalisation to held-out modes (the no-cheat payload)}

-route suppresses deploy hack on loophole modes whose pairs were never in the
-extraction set, not only the in-distribution mode (Table~\ref{tab:generalisation}).
-\TODO{prose -- author; this is the partial read, not the designed 2-of-4 test.}
+route suppresses deploy hack on loophole modes the route gate never saw a label
+for, not only the demonstrated mode (Table~\ref{tab:generalisation}).
+\TODO{prose -- author. This is now the designed held-out test (jobs 103/104):
+the teacher demonstrates only \texttt{run\_tests}; the other three modes are
+suppressed with zero labels. Vanilla baseline column filled (job 103); route2
+column pending job 104.}

-% --- Table: per-mode held-out ----------------------------------------------
-% Provenance: per_mode deploy_hack already present in the route2 n=3 JSONs
-% (in_dist flag marks which modes were in the pairset). For the route2 nofloor
-% runs: run_tests in_dist=true; file_marker, sentinel in_dist=false.
-%   s41: run_tests 0/8 | file_marker 0.000 | sentinel 0.000
-%   s42: run_tests 0/8 | file_marker 0.000 | sentinel 0.000
-%   s43: run_tests 0/8 | file_marker 0.188  | sentinel 0.000
-% stdout_marker absent from the fixed n=64 eval subset (TODO: coverage).
-% This is the C2 signal but NOT the clean 2-of-4 design -- A5 (jobs G2/G3,
-% spec 20260528_cross_mechanism_v_hack) is NOT YET QUEUED. Treat as partial.
+% --- Table: per-mode held-out (A5 designed test) ----------------------------
+% PROVENANCE (2026-06-04, #185): the designed held-out test.
+% Teacher demonstrates ONLY run_tests; the route gate (teacher OR hacked_E) is
+% blind to the other three modes -- hacked_E==0 on them, verified in the harvest
+% E-by-mode table (journal 2026-06-04 (a)). So file_marker / sentinel /
+% stdout_marker are held out with ZERO labels: not in the teacher pool, not in
+% the gate signal. This is the load-bearing no-cheat check.
+%   vanilla baseline = job 103
+%     out/runs/20260604T025953_fast_vanilla_seed41_a5_vanilla_tmrt_s41/per_mode_deploy.json
+%     deploy_hack (n=48/mode): run_tests 1.000 | file_marker 0.625 | sentinel 0.417 | stdout_marker 0.167
+%   route2 suppression = job 104 (per-mode JSON pending at edit time).
+%     knob-off aggregate deploy hack tracked 0.00-0.047 through step 180 vs baseline 0.552.
 \begin{table}[t]
  \centering
-  \caption{Per-mode deploy hack, route $n{=}3$. ``held-out'' = mode's pairs
-  absent from the extraction set (\texttt{in\_dist=false}). \TODO{the clean
-  2-of-4 held-out design (A5 / jobs G2/G3) is not yet queued; these per-mode
-  numbers are an opportunistic read of the $n{=}3$ runs, not the designed test.}}
+  \caption{Per-mode deploy hack, A5 designed held-out test ($n{=}1$, seed 41).
+  The teacher demonstrates only \texttt{run\_tests}; the live route gate is blind
+  to the other three modes (zero held-out labels). ``Held out'' = mode absent
+  from both the teacher pool and the gate signal. \TODO{route2 column lands when
+  job 104 finishes; $n{=}3$ is future work.}}
  \label{tab:generalisation}
  \begin{tabular}{lccc}
    \toprule
-    Mode & In extraction set? & Deploy hack (route) $\downarrow$ & Deploy hack (vanilla) \\
+    Mode & Held out? & Deploy hack (route2) $\downarrow$ & Deploy hack (vanilla) \\
    \midrule
-    run\_tests   & yes & $0.000$ (all seeds) & \TODO{job 84} \\
-    file\_marker & no  & $0.063$ (mean)      & \TODO{} \\
-    sentinel     & no  & $0.000$ (all seeds) & \TODO{} \\
-    stdout\_marker & \TODO{not in eval subset} & \TODO{} & \TODO{} \\
+    run\_tests     & no (demoed) & \TODO{job 104} & $1.000$ \\
+    file\_marker   & yes         & \TODO{job 104} & $0.625$ \\
+    sentinel       & yes         & \TODO{job 104} & $0.417$ \\
+    stdout\_marker & yes         & \TODO{job 104} & $0.167$ \\
    \bottomrule
  \end{tabular}
 \end{table}