modal: parallel GRPO sweep port (image, volume, fan-out launcher)

Fire the paper sweep as independent H100/A100-80 containers instead of serial pueue runs. One Volume caches model + svd + out/; train.py runs unmodified (torch 2.7 + Dao flash-attn wheel, code mounted at runtime). Verified: vanilla 60-step reproduces the local baseline. Skill at ~/.claude/skills/modal documents the patterns. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 16:45:42 +08:00 · 2026-06-06 20:30:19 +08:00
parent bcf09dd742
commit 70aa6aa96b
7 changed files with 597 additions and 1 deletions
@@ -32,6 +32,7 @@ dependencies = [
    # release with Blackwell sm_120 kernels (consumer RTX PRO 6000). Pinned to
    # mjun0812 prebuilds — see [tool.uv.sources] below.
    "flash-attn",
+    "modal>=1.4.3",
 ]

 [project.optional-dependencies]