mirror of
https://github.com/wassname/lora-lite.git
synced 2026-06-27 18:59:51 +08:00
e624cd244f
Trainable params that were init'd at exact 0 or 1 now use near_zero (N(0,1e-4)) or near_one (1 + N(0,1e-4)) to break bf16 symmetry without meaningfully breaking identity-at-t=0. Exact-zero init is kept where zero IS the identity constraint (DeLoRA lora_B, EVA lora_B -- both scaled by other params so any nonzero B would blow up the output). AntiPaSTO: delta_s and rot_T now near_zero. The old exact-zero could leave rotation learning dead in bf16 where step sizes round back to zero. IA3: lora_g now near_one instead of exact ones. Avoids the bf16 spacing issue around 1.0 where eps_bf16 ~ 7.8e-3 and lr=1e-3 updates were rounding away. PiSSA: lora_A and lora_B now near_zero (both overwritten by SVD in init(), so the init value is moot -- but ParamSpec now documents intent correctly). HRA: lora_U now near_zero (overwritten by symmetric init in init()). ParamSpec: added 'near_zero' and 'near_one' init modes. Default changed from 'zeros' to 'near_zero'. Tests relaxed identity tolerances accordingly.
20 lines
174 B
Plaintext
20 lines
174 B
Plaintext
.venv
|
|
.env
|
|
__pycache__/
|
|
*.pyc
|
|
*.pyo
|
|
*.pyd
|
|
*.db
|
|
*.sqlite3
|
|
*.log
|
|
*.bak
|
|
build/
|
|
dist/
|
|
*.egg-info/
|
|
logs/
|
|
docs/spec/
|
|
outputs/
|
|
tests/_artifacts/
|
|
docs/papers/*.pdf
|
|
docs/papers/*.txt
|