merge: resolve RESEARCH_JOURNAL conflict (keep both HEAD + remote Modal-port entry)

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 16:45:42 +08:00 · 2026-06-09 02:27:08 +00:00
parent 0f59b1351b 826aa911f7
commit ec88ba3e42
5 changed files with 867 additions and 1 deletions
@@ -3930,3 +3930,31 @@ Worth recording before the log is cleaned -- the routing trace is the finding.

 Reran: act_vote requeued as the next arm after vanilla (the run that was killed). Band-widening for
 act_vote is a candidate follow-up (smooth the 0/1 saturation).
+
+## 2026-06-07 (a) -- Modal port runs both arms to completion; the "routeV deadlock" was a buffering artifact
+
+**Introduction.** Question: does the Modal port (entry on the parallel-sweep port, commit 70aa6aa) actually run the real `vgrout.train` pipeline to completion on a cloud GPU, for both the vanilla and routeV arms? I had reported earlier this session that routeV "deadlocks at the first `generate()`" on Modal while vanilla completes, and built a torch-2.7-specific theory on it. I expected to either confirm a routeV-specific hang or find a cheap fix. What I found instead: there is no hang. Both arms complete; the apparent freeze was my local `modal run > log` capture block-buffering the subprocess stdout, so the local file sat at the first `generate()` line while the run was progressing fine server-side.
+
+**Methods.** Commit `a776db0`, model Qwen/Qwen3-4B, `fast` preset, on Modal H100/A100-80GB (image: torch 2.7.1 + Dao flash-attn 2.8.3 cp313 + transformers 5.10.2). Two runs, each launched via `modal run modal/app.py`: vanilla `--action warm` (intervention none, seed 41, 1 step) and routeV `--action smoke` (intervention routeV, seed 43, 4 steps, `--eval-ablate-every=2 --eval-n-prompts=2`). No pueue (these are Modal apps); provenance is keyed by Modal app id below. The data fix in this session (mount the 44MB LeetCode jsonls from the image, not the Volume) and `PYTHONUNBUFFERED=1` in the subprocess env are both in `a776db0`.
+
+**Results.**
+
+| modal app id | arm | seed | steps | mean hack_s | mean gt_s | deploy hack | deploy solve | wall (min) | exit |
+|---|---|---|---|---|---|---|---|---|---|
+| ap-1p67GAW7 | vanilla | 41 | 1 | 0/28 | 6/28 | 0.000 | 0.208 | 6.8 | 0 |
+| ap-fPnBJKAM | routeV | 43 | 4 | 0/28 | 10.25/28 | 0.000 | 0.292 | 14.5 | 0 |
+
+Table 1. Per-run means of `hack_s` (reward-hacking student-rollout count, denominator = student rollouts per step) and `gt_s` (ground-truth pass count, same denominator) over the run's steps, for two Modal smoke runs on the `fast` preset. `deploy hack`/`deploy solve` are the knob-off final-eval rates (n=24 prompts, T=0.7). These are infra-verification smokes (1 and 4 steps), NOT a suppression measurement: 1-4 steps is far below the tens of steps needed for hacking to emerge, so `hack_s=0` here means "no time to learn the cheat", not "the method suppressed it". The result the table reports is the rightmost columns: both arms exit 0 with full artifacts written.
+
+Provenance:
+- Commit producing the runs: `a776db0` (image + data-mount + PYTHONUNBUFFERED).
+- Run commands:
+  - vanilla: `modal run modal/app.py --action warm` -> argv `fast --intervention=none --steps=1 --eval-n-prompts=2 --out-tag=_warm`
+  - routeV: `modal run modal/app.py --action smoke` -> argv `fast --intervention=routeV --seed=43 --steps=4 --eval-ablate-every=2 --eval-n-prompts=2 --out-tag=_modal_smoke`
+- Local capture logs (this session): `/tmp/modal_warm_datafix.log`, `/tmp/modal_smoke_verify.log`. Volume run dirs: `out/runs/20260607T013602_fast_vanilla_seed41_warm`, `out/runs/20260607T022832_fast_routingV_seed43_modal_smoke` (each has `per_mode_deploy.json`, `train.safetensors`, rollouts; routeV also `ckpt_step000{0,2,3}.safetensors`).
+- Cell provenance, routeV mean gt_s/hack_s: the four per-step table rows in `/tmp/modal_smoke_verify.log` (ANSI-stripped), `gt_s` column = (12, 8, 12, 9)/28 (mean 10.25/28 = 0.366), `hack_s` column = (0, 0, 0, 0)/28. routeV deploy line: `FINAL EVAL [routingV] (n=24): ... deploy/knob-off hack=0.000 solve=0.292`. Routing was active: `||delta_S_hack|| = 3.22`. Wall: `done in 14.5 min` / `done in 6.8 min` lines; `wall_s` 867.4 / 405.0 in the returned dict.
+- vanilla cell provenance: single step-0 row in `/tmp/modal_warm_datafix.log`, `gt_s`=6/28, `hack_s`=0/28; `per_mode_deploy` hack 0.0 solve 0.208.
+
+**Discussion (speculative).** My read: the port is functionally correct and the earlier "routeV deadlock" was entirely an observability bug, not a real one. The discriminating evidence is that the killed routeV run had already produced step 0-3 rows with real rewards and a non-zero `||delta_S_hack||`; a process deadlocked at its first `generate()` cannot emit step-3 results. So the freeze lived in my terminal, not the GPU. The fix (`PYTHONUNBUFFERED=1` plus reading `modal app logs` server-side) made the local stream live, and the re-run completed. One alternative hypothesis I considered and rejected: that routeV's per-rollout routing hook deadlocks `generate()` specifically on torch 2.7.1 (the Modal image) vs 2.8 (local box). It is refuted by the same evidence (the run completed under torch 2.7.1) and by the fact that the routeV hook's `grad_probe` branch is gated on `torch.is_grad_enabled()`, which is False inside `generate()`, so routeV and vanilla execute the identical hook path during generation anyway. A second alternative, that the flash-attn wheel is ABI-mismatched to the image torch, is refuted by both arms generating correctly on that wheel. Caveat on cost: my only full-length anchor is a prior-session 60-step vanilla run (~105 min, ~$6.90 on H100); routeV is heavier (v_grad extraction + per-rollout gating) so a 60-step routeV will run longer, and the n=64 heavy final eval adds ~25-30 min on its own (HF `.generate`, ~10 min already seen at n=24). I have not yet measured a routeV 60-step wall-clock, so the ~$80-130 USD whole-sweep figure is an extrapolation, not a measurement.
+
+**Next.** Run one timed 60-step routeV on Modal (seed 43) to get the real routeV $/run, and diff its `per_mode_deploy.json` against the local-box artifact for identical argv before fanning out the 12-job sweep. Also refresh `launch.py::JOBS` from the current manifest (it is still the stale 2026-06-06 copy) so the fan-out runs the intended arms.