mirror of
https://github.com/wassname/lora-lite.git
synced 2026-06-27 15:15:55 +08:00
justfile: lr=5e-3 for all antipasto_* cores in bench-variant
The small-param antipasto family (gain/block/ablate/corda) all need the higher lr to clear the bf16 round-to-nearest floor, not just antipasto. Glob the case. Co-Authored-By: Claudypoo <noreply@anthropic.com>
This commit is contained in:
@@ -88,7 +88,7 @@ bench-variant model variant steps="5000":
|
||||
delora) lr=1e-3 ;;
|
||||
ia3) lr=5e-3; target='(k_proj|v_proj)$' ;;
|
||||
ia3_ff) lr=5e-3; target='(down_proj)$' ;;
|
||||
antipasto) lr=5e-3 ;; # small params need higher lr
|
||||
antipasto*) lr=5e-3 ;; # small params (gain/block) need higher lr; covers all antipasto_* cores
|
||||
esac
|
||||
exec uv run --extra benchmark python scripts/metamath_gsm8k_benchmark.py \
|
||||
--model '{{model}}' \
|
||||
|
||||
Reference in New Issue
Block a user