modal: parallel GRPO sweep port (image, volume, fan-out launcher)

Fire the paper sweep as independent H100/A100-80 containers instead of
serial pueue runs. One Volume caches model + svd + out/; train.py runs
unmodified (torch 2.7 + Dao flash-attn wheel, code mounted at runtime).
Verified: vanilla 60-step reproduces the local baseline. Skill at
~/.claude/skills/modal documents the patterns.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-06 20:30:19 +08:00
parent bcf09dd742
commit 70aa6aa96b
7 changed files with 597 additions and 1 deletions
+1
View File
@@ -32,6 +32,7 @@ dependencies = [
# release with Blackwell sm_120 kernels (consumer RTX PRO 6000). Pinned to
# mjun0812 prebuilds — see [tool.uv.sources] below.
"flash-attn",
"modal>=1.4.3",
]
[project.optional-dependencies]