From 5419771d70747c74f9536f607f1c0d36613b79b5 Mon Sep 17 00:00:00 2001
From: wassname <1103714+wassname@users.noreply.github.com>
Date: Sun, 7 Jun 2026 10:39:41 +0800
Subject: [PATCH] modal: there was no routeV hang -- it was local stdout
 buffering

Retract the "routeV deadlocks at first generate()" finding from d96367c. The
server-side `modal app logs` show the killed routeV smoke had actually run training
steps 0-3 (real rewards, ||delta_S_hack||=3.23, coherent generations) and was inside
the 24-prompt FINAL EVAL when I stopped it -- a deadlocked-at-first-generate process
cannot produce step 1/2/3 results. The "freeze" was the local `modal run > log`
capture block-buffering the subprocess stdout; the run was healthy the whole time.

Fix: PYTHONUNBUFFERED=1 in _run_train env so the local stream is live, and monitor
via `modal app logs <app-id>` (server-side truth). Corrected the app.py comment and
replaced the README "known issue" with the buffering gotcha. routeV runs fine on
Modal -- the routeV sweep is viable, no torch-2.7 debug needed.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
---
 modal/README.md | 34 ++++++++++++++--------------------
 modal/app.py    | 14 +++++++++-----
 2 files changed, 23 insertions(+), 25 deletions(-)
diff --git a/modal/README.md b/modal/README.md
index 0f42cf1..1d4d675 100644
--- a/modal/README.md
+++ b/modal/README.md
@@ -86,30 +86,24 @@ python modal/fetch.py                 # all of out/runs + logs
 python modal/fetch.py <ts>_<slug>     # one run
 ```
 
-## Known issue — routeV deadlocks at first generate() on Modal
+## Gotcha — monitor server-side, the local stream block-buffers
 
-**vanilla works, routeV hangs, on the identical image.** On the `transformers==5.10.2`
-+ Dao flash-attn 2.8.3 (torch 2.7.1) image, a vanilla warm completes generate cleanly
-(6.8 min, exit 0, `per_mode_deploy.json` written), but a routeV smoke freezes at its
-first rollout `generate()` indefinitely (killed at ~11 min). Same image, same attn,
-same data, same delta_S init — the only difference is the arm.
+There was no routeV deadlock. The earlier "routeV freezes at the first generate()"
+was a **monitoring artifact**: piping `modal run ... > local.log` captures the
+subprocess stdout block-buffered, so the local file sits at the first `generate()`
+warning while the run is actually progressing. A routeV smoke I killed at ~11 min
+thinking it hung had in fact completed training steps 0-3 (real rewards,
+`||delta_S_hack||=3.23`, coherent generations) and was inside the 24-prompt FINAL
+EVAL — the server-side `modal app logs <app-id>` showed all of it. I killed a healthy
+run and built a false "torch-2.7 routeV deadlock" theory on the buffering.
 
-It is NOT: the data path (both arms load fine now), the attn backend (the other agent
-reproduced the hang under sdpa too, commit 2f91561), transformers version (vanilla
-runs on this exact 5.10.2), or the generate-time forward hook (routeV's `grad_probe`
-branch is skipped under `generate()`'s no_grad, so both arms run the identical
-`_delta_hook` else-path). What's left: routeV extracts `v_grad` fresh before the loop
-(forward+backward across 252 modules), and that CUDA/allocator state collides with
-flash-attn's first generate on torch 2.7.1. The **local box runs routeV fine on torch
-2.8** — so this is a torch-2.7-on-Modal deadlock, not a method bug.
+Two fixes, both in place: `_run_train` sets `PYTHONUNBUFFERED=1` (env), and you should
+**always monitor via `modal app logs <app-id>`** (server-side truth), not the local
+`modal run` capture.
 
 Correction to commit 2873b37: its message claimed "the smoke on pinned 5.10.2 clears
-the deadlock point." It does not — the smoke hung. transformers was never the cause.
-
-**Current call:** Modal is proven for vanilla; the local box produces the canonical
-routeV runs (jobs 134/135). Don't sink Modal $ into the routeV hang unless we
-specifically want the parallelism — the fix is a focused torch-2.7 generate debug
-(candidate: `torch.cuda.empty_cache()` at the extraction→loop boundary, unverified).
+the deadlock point." There was no deadlock to clear; the framing was wrong. routeV and
+vanilla both run generate() fine on this image.
 
 ## Caveat — keep the inventory fresh
 
diff --git a/modal/app.py b/modal/app.py
index 61c971e..0b72a34 100644
--- a/modal/app.py
+++ b/modal/app.py
@@ -59,11 +59,9 @@ image = (
     .pip_install(
         # transformers: pinned released version, NOT floating `@ main`. uv.lock keeps
         # the exact 5.8.0.dev0 commit for the local box; the image uses a released
-        # wheel (Qwen3-4B needs no main-only feature). NB: transformers is NOT the
-        # routeV generate() hang -- on this exact 5.10.2 image, vanilla completes
-        # generate fine (warm, 6.8 min, exit 0) while routeV deadlocks at its first
-        # rollout generate. The hang is routeV-specific (v_grad extraction's CUDA
-        # state x flash-attn first-generate on torch 2.7.1); see modal/README.md.
+        # wheel (Qwen3-4B needs no main-only feature). Both vanilla and routeV run
+        # generate() fine on this image -- the earlier "routeV deadlock" was a local
+        # stdout-buffering misread, not a real hang (see modal/README.md).
         "transformers==5.10.2",
         "einops>=0.8",
         "jaxtyping>=0.2",
@@ -152,6 +150,12 @@ def _run_train(argv: list[str]) -> dict:
         "HF_HOME": f"{CACHE}/hf",
         "HF_HUB_DISABLE_PROGRESS_BARS": "1",
         "PYTORCH_CUDA_ALLOC_CONF": "expandable_segments:True",
+        # Unbuffered so the train log streams live. Without this the subprocess
+        # stdout block-buffers; the local `modal run` capture then looks frozen at
+        # the first generate() while the run is actually progressing (this is what
+        # I misread as a "routeV deadlock" -- see modal/README.md). Server-side
+        # `modal app logs <id>` always has the truth; this makes the local view match.
+        "PYTHONUNBUFFERED": "1",
     }
     runs_before = set(Path(f"{CACHE}/out/runs").glob("*")) if Path(f"{CACHE}/out/runs").exists() else set()