mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 15:15:40 +08:00
chore: FastConfig save_ckpt_every 10->20 (disk pressure; ~6 ckpts/run suffices)
Each lora2r ckpt is ~1.3G (A/B + redundant frozen A0/B0, 252 modules fp32). The 768G disk filled and runs crashed at the step-0 ckpt save. 20-step cadence halves the per-run footprint while keeping enough points for the eval curve. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -128,3 +128,7 @@ class FastConfig(Config):
|
||||
adam_beta1: float = 0.5
|
||||
adam_beta2: float = 0.9
|
||||
lr: float = 5e-4 # user: bump from 1e-4 to learn faster in the short grad-starved budget
|
||||
# Each lora2r ckpt is ~1.3G (A/B + redundant frozen A0/B0 for 252 modules, fp32); 20-step
|
||||
# cadence keeps ~6/run for the eval curve without filling the 768G disk. (TODO: drop A0/B0
|
||||
# from ckpts -- reconstructible from lora_init_seed -- to halve size, needs a loader change.)
|
||||
save_ckpt_every: int = 20
|
||||
|
||||
Reference in New Issue
Block a user