lora-lite/.gitignore at 9d027752ad4e9cfdd94773ffa36ea4faafcc028c - lora-lite - Gitea: Git with a cup of tea

wassname/lora-lite

mirror of https://github.com/wassname/lora-lite.git synced 2026-06-27 18:59:51 +08:00

Files

T

wassname e624cd244f feat: near_zero/near_one init for trainable params (breaks bf16 dead-grad symmetry)

Trainable params that were init'd at exact 0 or 1 now use near_zero (N(0,1e-4))
or near_one (1 + N(0,1e-4)) to break bf16 symmetry without meaningfully
breaking identity-at-t=0. Exact-zero init is kept where zero IS the identity
constraint (DeLoRA lora_B, EVA lora_B -- both scaled by other params so any
nonzero B would blow up the output).

AntiPaSTO: delta_s and rot_T now near_zero. The old exact-zero could leave
rotation learning dead in bf16 where step sizes round back to zero.

IA3: lora_g now near_one instead of exact ones. Avoids the bf16 spacing issue
around 1.0 where eps_bf16 ~ 7.8e-3 and lr=1e-3 updates were rounding away.

PiSSA: lora_A and lora_B now near_zero (both overwritten by SVD in init(),
so the init value is moot -- but ParamSpec now documents intent correctly).

HRA: lora_U now near_zero (overwritten by symmetric init in init()).

ParamSpec: added 'near_zero' and 'near_one' init modes. Default changed from
'zeros' to 'near_zero'. Tests relaxed identity tolerances accordingly.

2026-04-27 15:55:05 +08:00

20 lines

174 B

Plaintext

Raw Blame History

 .venv
 .env
 __pycache__/
 *.pyc
 *.pyo
 *.pyd
 *.db
 *.sqlite3
 *.log
 *.bak
 build/
 dist/
 *.egg-info/
 logs/
 docs/spec/
 outputs/
 tests/_artifacts/
 docs/papers/*.pdf
 docs/papers/*.txt