mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:45:42 +08:00
modal: parallel GRPO sweep port (image, volume, fan-out launcher)
Fire the paper sweep as independent H100/A100-80 containers instead of serial pueue runs. One Volume caches model + svd + out/; train.py runs unmodified (torch 2.7 + Dao flash-attn wheel, code mounted at runtime). Verified: vanilla 60-step reproduces the local baseline. Skill at ~/.claude/skills/modal documents the patterns. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -32,6 +32,7 @@ dependencies = [
|
||||
# release with Blackwell sm_120 kernels (consumer RTX PRO 6000). Pinned to
|
||||
# mjun0812 prebuilds — see [tool.uv.sources] below.
|
||||
"flash-attn",
|
||||
"modal>=1.4.3",
|
||||
]
|
||||
|
||||
[project.optional-dependencies]
|
||||
|
||||
Reference in New Issue
Block a user