docs(writeup): NeurIPS-workshop paper skeleton + tectonic compile recipe

Minimal LaTeX skeleton: outline + evidence tables (route2 n=3 deploy numbers filled with provenance, vanilla pending jobs 74/84) + figures + verified refs + appendix (4-mode traces, 6/6/6/6 partition counts, pseudocode). Build artifacts and figs symlinks gitignored. `just paper` compiles via tectonic; `just paper-qc` dumps text + greps for unresolved refs / TODOs. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 17:30:41 +08:00 · 2026-06-02 06:59:15 +00:00
parent 17e4f2e2ff
commit 923de6dbe6
6 changed files with 819 additions and 4 deletions
@@ -29,7 +29,7 @@ direction from 2 of the 4 loopholes, measure suppression on the other 2.
 C1 (primary, existence -> systematic). Routing the GRPO gradient against a
 weak-detector hack direction in the SVD-of-W basis lowers deploy hack rate vs
 vanilla GRPO at matched-ish solve rate, replicated over n=3 seeds.
- Evidence: jobs 68/69/70 (route2 no-floor s41/42/43) vs 79/74/72 (vanilla
+- Evidence: jobs 68/69/70 (route2 no-floor s41/42/43) vs 84/74/72 (vanilla
  s41/42/43). Deploy = knob-off, n=64 prompts x group, T=0.7.
 - Confidence today: suggestive at n=1; n=3 band landing. NOT yet 30pp (the
  preregistered H1 bar); honest framing is "reduces hack at comparable solve",
@@ -90,11 +90,12 @@ deploy hack/solve + by_mode come from the JSON, per-step curves from the log/TSV

 A1 -- Keynote figure. route2 vs vanilla deploy hack/solve over training, n=3
 band. Prototype exists: out/figs/dyn_sub4*.png (`just dyn`). [/] blocked on the
-n=3 vanilla band (jobs 74 s42 + 79 s41; 72 s43 done; route2 68/69/70 done).
+n=3 vanilla band (jobs 74 s42 + 84 s41 [re-added from killed 79, p7 so it runs
+ahead of the A3 erase rows]; 72 s43 done; route2 68/69/70 done).

 A2 -- Keynote table. Per-arm deploy hack + deploy solve, mean +/- SEM over 3
 seeds, route2 no-floor vs vanilla, delta vs vanilla, paired test + alpha stated.
-[/] same blocker as A1 (74, 79).
+[/] same blocker as A1 (74, 84).

 A3 -- Ablation table (what each component buys; the arms you named). One row per
 arm at matched seed/preset, deploy hack + solve:
@@ -125,7 +126,7 @@ A7 -- Appendix ablation context. Cite results.md Q-rows already run: basis width
 (Q8), refresh cadence (Q5), teacher mix (Q6), gate mode (Q3), solve-orthog (Q9),
 pairset content/placebo (Q10). [x] data exists; just needs porting into the paper.

-Next action when 74+79 land: read each per_mode_deploy.json, `just dyn`,
+Next action when 74+84 land: read each per_mode_deploy.json, `just dyn`,
 fill A1/A2, append a journal entry. Then queue A5 (the gap).

 ## Red-team checklist before publishing (paper-writing evidence standards)