lora-lite

mirror of https://github.com/wassname/lora-lite.git synced 2026-06-27 16:45:56 +08:00

Author	SHA1	Message	Date
wassname	8005423c47	README: note LoRA-XS all-linear spread didn't help (test 55.6 vs down_proj 56.8) Paper spreads LoRA-XS across all q/k/v/o + FFN linears, not down_proj only. Tried it (150 modules, 0.154M params): test 55.6 / valid 62.0, slightly below the down_proj row at 6x params, within single-seed noise. down_proj-only stays the table entry. result: outputs/metamath_gsm8k_alllinear/...__seed0/result.json Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-18 23:49:36 +08:00
wassname	a75bed492b	README: add LoRA-XS variant row (test 56.8 / valid 68.0, params 0.025M) Qwen3.5-0.8B-Base, down_proj all 24 layers, r=32 alpha=32 lr=4e-3, 2500 steps. UAT: grad=0.699>0, dθ=60.0>0, base_grad_leaks=0. result: outputs/metamath_gsm8k/Qwen--Qwen3.5-0.8B-Base__lora_xs__s2500__seed0/result.json Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-18 21:28:10 +08:00
wassname	12e13cca79	README: rot basis is within noise (seed order flips), soften V claim Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-18 03:32:01 +08:00
wassname	12fa56f328	Collapse antipasto family to one variant: rot(V) becomes canonical antipasto main keeps a single antipasto = the rotation+delta SVD adapter (the published method, paper 2601.07473), default rotate_basis=V. On GSM8K/down_proj rot(V) led the family (57.2) and at a single seed nothing separated from it, while the covariance-oriented arms cost 34-120s init for no gain. The full family (gain core, U/both rotations, ablate, dplr, corda, asvd) is preserved on the antipasto-variants branch. - antipasto.py is now the rotation implementation, registered as "antipasto" - delete antipasto_{rot,ablate,corda,asvd,dplr}.py + their config exports - benchmark/justfile/cost_report/smoke: drop the removed variants + dead knobs (antipasto_coeff/suppress_only/ablate_k/cov_orient/lora_rank); keep --antipasto-rotate-basis as antipasto's V/U/both/none ablation axis - README: subset table to one antipasto row, add rank column, note single-seed noise floor (~1.4pp), point the full family at the branch smoke: 10 passed Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-17 21:05:51 +08:00
wassname	12109b6fc0	README: order variant table by test accuracy Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-17 18:26:45 +08:00
wassname	6cb350a4b6	README: fill IA3-FF row (56.3/62.0, 86k params, 0 added MACs) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-17 15:49:02 +08:00
wassname	4962bffd7d	README: fill EVA + IA3 baseline rows EVA 59.3/74.0 (28s SVD-warmstart init), IA3 52.3/62.0 (6k params, 0 added MACs). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-17 15:26:50 +08:00
wassname	7e024b4734	comment hygiene + HRA row: shorten docstrings, drop dead init branch, track asvd - variant.py: fix mislabeled "legacy entry" (make() is the live param path); drop unused near_one init branch - config.py: drop "replaces older LoraLiteConfig" history narration - antipasto_ablate.py: aspirational "should warm-start" comment -> tracked FIXME - antipasto_rot.py: cut "kept as separate variant" / "why antipasto dropped rotation" ramble - benchmark: merge duplicate antipasto/corda/asvd cfg branch - README: fill HRA row (test 59.2 / valid 70.0) - track antipasto_asvd.py (was imported+registered but uncommitted) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-17 11:16:07 +08:00
wassname	5f9d90d8b8	benchmark sweep: rot(U/both) ablation, whitening conclusion, cost rows - antipasto_rot: add rotate_basis="both" (independent V+U Cayley rotations), run_id suffix __rotU/__rotboth so ablation arms get their own output dirs - justfile: thread rotate_basis through bench-variant - corda/eva: padding-mask fix in calibration capture + bf16-tight residual - README: fill PiSSA/DoRA/CorDA/ASVD/ablate/dplr/rot rows; record the metric-axis ablation (C=I 56.0 > diag-C 55.6 > full-C 54.7) and the rotation ablation (V 57.2 > U 56.5 > both 55.6) conclusions - docs/reviews: external ref-checks + deepseek/gpt reviews of the cores Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-17 06:17:53 +08:00
wassname	e8ca6f5944	README: validation framing per wassname's wording; arrow large-block lr=1e-4 README: 'we validate the same way PEFT does; trained properly they clear 49% on GSM8K, all pass' + link to the benchmark script. justfile: arrow with block>8 uses lr=1e-4 not 5e-3. The 5e-3 that suits the tiny S-space gain destabilizes the large dense block -- block=128 at 5e-3 scored 45.7% (below the bar, vs block=8's 60.5%). Capacity sweep requeued at LoRA's 1e-4 to de-confound params-vs-lr. Co-Authored-By: Claudypoo <noreply@anthropic.com>	2026-06-15 18:27:33 +08:00
wassname	6b7b3a47dd	README: frame the GSM8K table as a validation harness, not a leaderboard The point is that every adapter clears PEFT's ~48% LoRA bar on the same MetaMathQA->GSM8K protocol -- that all rows pass is the it-trains signal, not a competitive ranking. Co-Authored-By: Claudypoo <noreply@anthropic.com>	2026-06-15 18:20:53 +08:00
wassname	6ab1dfff0e	README: antipasto variants as table rows; real PEFT reference - Fold the family into the main Variants table as rows (CorDA/ablate/arrow) instead of a separate table. - Lead with the point (freeze W's SVD, learn only a bounded gain -> interpretable, O(r) params) before any numbers. - Replace the unsourced 'PEFT reports 49.0%' line (wrong; LoRA is ~48%) with a real link to PEFT's method_comparison/MetaMathQA and a pointer to the benchmark script for hyperparameters. Link CorDA/Arditi papers inline. Co-Authored-By: Claudypoo <noreply@anthropic.com>	2026-06-15 18:18:09 +08:00
wassname	fa69e0cac3	README: trim AntiPaSTO section for researcher audience Replace the per-experiment family breakdown table + comparison prose with a 2-sentence method description (frozen interpretable SVD basis, O(r) gain, the three variant cores). Experiment findings (rotation comparison, arrow capacity, cost/timing) belong in the research journal, not the README skim path. Co-Authored-By: Claudypoo <noreply@anthropic.com>	2026-06-15 18:12:31 +08:00
wassname	90b5199ed9	README: AntiPaSTO family GSM8K results (5 variants, r=256) Replace the stale single AntiPaSTO row (was 35.8K params from the removed rotation version, described block-Cayley which no longer exists) with the real 5000-step Qwen3-0.6B numbers and a family breakdown: corda 61.9% 14.3K (best: covariance-oriented basis) plain 61.4% 14.3K rot 61.4% 35.8K (the rotation this replaces) ablate 61.0% 14.4K arrow 60.5% 17.5K Headline: ~320x fewer trainable params than LoRA at ~97% of its accuracy. Rotation buys nothing (rot matches plain to 3 s.f. at 2.5x params, +20% wall-time, plus a per-forward Cayley solve), confirming the drop. Co-Authored-By: Claudypoo <noreply@anthropic.com>	2026-06-15 07:05:45 +08:00
wassname	072a816cee	docs: fix hallucinated arxiv links in variants table AntiPaSTO, EVA, and HRA pointed at unrelated papers (stock prediction, LLM-vs-lawyer study, 2D Ising model). Replaced with verified IDs. Co-Authored-By: Claudypoo <claudypoo@noreply.invalid>	2026-05-26 05:48:49 +08:00
wassname	b698331cfa	feat: add HRA benchmark result (61.6%), update README table	2026-04-27 20:07:19 +08:00
wassname	e624cd244f	feat: near_zero/near_one init for trainable params (breaks bf16 dead-grad symmetry) Trainable params that were init'd at exact 0 or 1 now use near_zero (N(0,1e-4)) or near_one (1 + N(0,1e-4)) to break bf16 symmetry without meaningfully breaking identity-at-t=0. Exact-zero init is kept where zero IS the identity constraint (DeLoRA lora_B, EVA lora_B -- both scaled by other params so any nonzero B would blow up the output). AntiPaSTO: delta_s and rot_T now near_zero. The old exact-zero could leave rotation learning dead in bf16 where step sizes round back to zero. IA3: lora_g now near_one instead of exact ones. Avoids the bf16 spacing issue around 1.0 where eps_bf16 ~ 7.8e-3 and lr=1e-3 updates were rounding away. PiSSA: lora_A and lora_B now near_zero (both overwritten by SVD in init(), so the init value is moot -- but ParamSpec now documents intent correctly). HRA: lora_U now near_zero (overwritten by symmetric init in init()). ParamSpec: added 'near_zero' and 'near_one' init modes. Default changed from 'zeros' to 'near_zero'. Tests relaxed identity tolerances accordingly.	2026-04-27 15:55:05 +08:00
wassname	a342801807	wip	2026-04-27 11:24:19 +08:00
wassname	b60a8c3f9b	readme	2026-04-27 09:46:52 +08:00
wassname	bb8887e66c	tidy	2026-04-27 07:12:56 +08:00
wassname	b179771cc6	tyro and benchmark	2026-04-27 06:23:30 +08:00
wassname	0d929f93b3	feat(hra): add Householder Reflection Adaptation, hook-only/bnb-friendly + Qwen proof	2026-04-26 17:58:56 +08:00
wassname	2abf616be6	feat(dora): add weight-decomposed LoRA variant for fp layers	2026-04-26 17:53:33 +08:00
wassname	699fde31bf	feat: ia3 variant, real bnb 4bit/8bit smoke, dev guide split, user-only readme	2026-04-26 17:49:17 +08:00
wassname	f2d9021511	ci: add publishable check workflow	2026-04-26 17:09:47 +08:00
wassname	69bf5f4e44	test: prove adapter training paths	2026-04-26 17:00:39 +08:00
wassname	4db5cee5a9	init	2026-04-26 14:10:20 +08:00

27 Commits