lora-lite

mirror of https://github.com/wassname/lora-lite.git synced 2026-06-27 18:05:16 +08:00

Author	SHA1	Message	Date
wassname	90b5199ed9	README: AntiPaSTO family GSM8K results (5 variants, r=256) Replace the stale single AntiPaSTO row (was 35.8K params from the removed rotation version, described block-Cayley which no longer exists) with the real 5000-step Qwen3-0.6B numbers and a family breakdown: corda 61.9% 14.3K (best: covariance-oriented basis) plain 61.4% 14.3K rot 61.4% 35.8K (the rotation this replaces) ablate 61.0% 14.4K arrow 60.5% 17.5K Headline: ~320x fewer trainable params than LoRA at ~97% of its accuracy. Rotation buys nothing (rot matches plain to 3 s.f. at 2.5x params, +20% wall-time, plus a per-forward Cayley solve), confirming the drop. Co-Authored-By: Claudypoo <noreply@anthropic.com>	2026-06-15 07:05:45 +08:00
wassname	072a816cee	docs: fix hallucinated arxiv links in variants table AntiPaSTO, EVA, and HRA pointed at unrelated papers (stock prediction, LLM-vs-lawyer study, 2D Ising model). Replaced with verified IDs. Co-Authored-By: Claudypoo <claudypoo@noreply.invalid>	2026-05-26 05:48:49 +08:00
wassname	b698331cfa	feat: add HRA benchmark result (61.6%), update README table	2026-04-27 20:07:19 +08:00
wassname	e624cd244f	feat: near_zero/near_one init for trainable params (breaks bf16 dead-grad symmetry) Trainable params that were init'd at exact 0 or 1 now use near_zero (N(0,1e-4)) or near_one (1 + N(0,1e-4)) to break bf16 symmetry without meaningfully breaking identity-at-t=0. Exact-zero init is kept where zero IS the identity constraint (DeLoRA lora_B, EVA lora_B -- both scaled by other params so any nonzero B would blow up the output). AntiPaSTO: delta_s and rot_T now near_zero. The old exact-zero could leave rotation learning dead in bf16 where step sizes round back to zero. IA3: lora_g now near_one instead of exact ones. Avoids the bf16 spacing issue around 1.0 where eps_bf16 ~ 7.8e-3 and lr=1e-3 updates were rounding away. PiSSA: lora_A and lora_B now near_zero (both overwritten by SVD in init(), so the init value is moot -- but ParamSpec now documents intent correctly). HRA: lora_U now near_zero (overwritten by symmetric init in init()). ParamSpec: added 'near_zero' and 'near_one' init modes. Default changed from 'zeros' to 'near_zero'. Tests relaxed identity tolerances accordingly.	2026-04-27 15:55:05 +08:00
wassname	a342801807	wip	2026-04-27 11:24:19 +08:00
wassname	b60a8c3f9b	readme	2026-04-27 09:46:52 +08:00
wassname	bb8887e66c	tidy	2026-04-27 07:12:56 +08:00
wassname	b179771cc6	tyro and benchmark	2026-04-27 06:23:30 +08:00
wassname	0d929f93b3	feat(hra): add Householder Reflection Adaptation, hook-only/bnb-friendly + Qwen proof	2026-04-26 17:58:56 +08:00
wassname	2abf616be6	feat(dora): add weight-decomposed LoRA variant for fp layers	2026-04-26 17:53:33 +08:00
wassname	699fde31bf	feat: ia3 variant, real bnb 4bit/8bit smoke, dev guide split, user-only readme	2026-04-26 17:49:17 +08:00
wassname	f2d9021511	ci: add publishable check workflow	2026-04-26 17:09:47 +08:00
wassname	69bf5f4e44	test: prove adapter training paths	2026-04-26 17:00:39 +08:00
wassname	4db5cee5a9	init	2026-04-26 14:10:20 +08:00

14 Commits