lora-lite

mirror of https://github.com/wassname/lora-lite.git synced 2026-06-27 18:05:16 +08:00

Author	SHA1	Message	Date
wassname	ce8c250422	perf: use matmul for lora adapter projections	2026-05-21 08:23:56 +08:00
wassname	88f107a423	antipasto: delta_s init 4e-4+N(0,4e-4) from antipasto3, rotate_basis='none' option	2026-04-27 16:27:12 +08:00
wassname	e624cd244f	feat: near_zero/near_one init for trainable params (breaks bf16 dead-grad symmetry) Trainable params that were init'd at exact 0 or 1 now use near_zero (N(0,1e-4)) or near_one (1 + N(0,1e-4)) to break bf16 symmetry without meaningfully breaking identity-at-t=0. Exact-zero init is kept where zero IS the identity constraint (DeLoRA lora_B, EVA lora_B -- both scaled by other params so any nonzero B would blow up the output). AntiPaSTO: delta_s and rot_T now near_zero. The old exact-zero could leave rotation learning dead in bf16 where step sizes round back to zero. IA3: lora_g now near_one instead of exact ones. Avoids the bf16 spacing issue around 1.0 where eps_bf16 ~ 7.8e-3 and lr=1e-3 updates were rounding away. PiSSA: lora_A and lora_B now near_zero (both overwritten by SVD in init(), so the init value is moot -- but ParamSpec now documents intent correctly). HRA: lora_U now near_zero (overwritten by symmetric init in init()). ParamSpec: added 'near_zero' and 'near_one' init modes. Default changed from 'zeros' to 'near_zero'. Tests relaxed identity tolerances accordingly.	2026-04-27 15:55:05 +08:00
wassname	0bd091fe5b	tidy	2026-04-27 11:44:40 +08:00
wassname	24ba8deb02	simpler test	2026-04-27 09:47:07 +08:00
wassname	a44fc039af	rm defensive docstr	2026-04-27 06:39:18 +08:00
wassname	b179771cc6	tyro and benchmark	2026-04-27 06:23:30 +08:00
wassname	053901e0ca	types, review	2026-04-26 20:35:38 +08:00
wassname	fdb4c77d6c	Add reference-impl URLs to variant docstrings + V2 external review - Fetch canonical reference impls for offline review: * peft_{lora,hra,delora,ia3}_layer.py + peft_lora_{dora,variants}.py * orig_pissa_init.py (MuLabPKU/PiSSA) * orig_hra_layer.py (DaShenZi721/HRA) * orig_delora.py (ExplainableML/DeLoRA author fork) - Add reference-impl URLs to all 6 variant docstrings - Document HRA gate=0 dead-grad issue and DoRA detach-omission in their docstrings - Re-run external review (codex) with refs available -> docs/audit/variants_review_v2.md Major NEW findings vs paper-only review: * DeLoRA: scalar W.norm() should be per-input-channel norm(dim=0) * HRA: PEFT uses symmetric repeated-column init (no dead grad), not zero gate * IA3: FFN targets need input-side gating, not output, our up_proj advice wrong * All LoRA-family: cfg.dropout silently ignored (no-op) * DeLoRA: wnorm should be persistent buffer, not Parameter HRA and DeLoRA upgraded to BUGGY (from Partial)	2026-04-26 19:27:47 +08:00
wassname	7eeaeed206	Verify all variants on bnb 4bit/8bit; HRA paper-faithful rewrite - Test all 6 variants against bnb.Linear8bitLt + Linear4bit in smoke - bnb-friendly (LoRA, IA3, HRA, DeLoRA): identity err <= 2.4e-4 - bnb-incompatible (PiSSA, DoRA): fail-loud TypeError as expected - HRA: rewrite to paper-faithful input-side reflections (h <- (I-2vv^T)h), fixing previous broken output-side formulation - IA3: bypass dtype upcast for bnb (params stay fp16/quantized) - DeLoRA: explicit type check rejecting non-nn.Linear (incl. bnb) - adapter: special-case bnb param assignment via .data - Re-verified Qwen0.6B HRA probe: drop=20.7%, id_err=0, reload=0	2026-04-26 18:08:06 +08:00
wassname	4db5cee5a9	init	2026-04-26 14:10:20 +08:00

11 Commits