train.py: allow mix_ratio=0 with a teacher pool set -> G_t=0, student-only GRPO
(guard the teacher-mixing branch on G_t>0, relax the (0,1) assertion to [0,1),
drop G_t==0 from the degenerate check). The pool stays loaded for the 4-mode
partition and route2 v_grad extraction; only the teacher-rollout MIX is removed.
Smoke (mix=0 + normal mix=0.5 + vanilla) all green.
Also: fill A4 long-run figure (fig:longrun) in main.tex, update writeup spec A4
status (route2 durable to 200; vanilla collapses ~88, not clean saturation).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- deploy hack/solve is now the headline metric for all arms, so turn the
mid-train deploy-eval on by default (smoke now covers the deploy path too);
200-step runs pass a sparser cadence explicitly.
- docs/spec/20260602_writeup_spec.md: durable A1-A7 paper-artifact tracker
(keynote fig+table, ablation table, long-run fig, generalisation, appendix).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>