lora-lite

mirror of https://github.com/wassname/lora-lite.git synced 2026-06-27 18:05:16 +08:00

Author	SHA1	Message	Date
wassname	9d027752ad	variants: replace arrow's dense block with diagonal-plus-low-rank core antipasto_arrow -> antipasto_dplr. The arrowhead's dense b x b block is the wrong shape: b^2 params, mixes only the top-b, and sits on the S-scaled coords so its perturbation is amplified by the largest singular values (block=128 collapsed to 45.7% at the gain's lr). Replace it with LoRA's lesson -- a low-rank core inside the frozen basis, ADDED to the gain: DeltaW = U [diag(S_eff) + coeff * B A] Vh, A:(k,r) B:(r,k), B=0 at init The low-rank part mixes the whole top-r subspace for 2rk params (k=LoRA's rank), and being additive (not * diag(S)) it is S-independent -- the amplification edge is gone by construction. Diagonal gain unchanged; identity at init from B=0 and g=0. Wired through benchmark (antipasto_lora_rank, run_id __k suffix), justfile, cost_report, smoke (green, dplr attaches/trains/round-trips). Arrow code removed; its run results stay on disk for comparison. Co-Authored-By: Claudypoo <noreply@anthropic.com>	2026-06-15 20:13:15 +08:00
wassname	2c56196dea	justfile/run_id: r override for low-rank antipasto sweeps bench-variant gains an r_override arg (alpha tracks r for the antipasto family); run_id appends __r<N> when an antipasto-family run uses r!=256, so the low-rank corda-vs-antipasto sweep does not overwrite the r=256 results. Co-Authored-By: Claudypoo <noreply@anthropic.com>	2026-06-15 18:35:54 +08:00
wassname	d9d31a160f	variants: clean docstrings to research pseudocode; arrow block param Rewrite antipasto/ablate/corda/arrow docstrings to the house style (purpose + math block + identity line + refs), dropping the rambly meta-commentary aimed at past design decisions ('Changes vs the rotation version', chat references, inline measurements). Net -74 lines. Also answer the FIXMEs left on main's old copy: - group_init is Wanda/ASVD selection (re-rank W's own singular vectors), NOT CorDA re-orientation -- that is antipasto_corda.py. - it rebuilds the FULL W exactly (W_res + stored top-r == W), so the re-SVD sees the whole spectrum, not a cropped matrix. Arrow capacity: --antipasto-block CLI knob (justfile bench-variant 4th arg) so the block can be scaled toward LoRA params; run_id gets a __b<N> suffix so block-sweep runs do not collide. Smoke green (14 passed). Co-Authored-By: Claudypoo <noreply@anthropic.com>	2026-06-15 18:09:53 +08:00
wassname	0d40cc9b38	Add antipasto_arrow: structured fixed-basis core (cross-direction mixing) antipasto's diagonal core can only rescale each frozen singular direction; it can never let direction i's input drive direction j's output, yet the steered behaviour is an off-axis combination. A dense r x r core fixes that but costs r^2 params. antipasto_arrow uses the arrowhead structure instead: a dense b x b block on the top-b singular directions (full coupling where the action lives) plus a diagonal 1+ELU tail on the rest. b^2 + (r-b) params, one b x b matmul per forward -- cross-direction mixing at diagonal-core cost, no Cayley solve. Identity at init (M=0 -> B=I, g=0 -> gain=1). Verified on a Linear: rel_err 1.5e-7 at init; M[i,j] routes input dir j -> output dir i with weight exactly M[i,j] (diagonal core forces 0); 14 train params at r=8,b=3 vs r^2=64. Wired into benchmark (antipasto_block knob), smoke (block=2 for r=4), cost report, and exports. Co-Authored-By: Claudypoo <noreply@anthropic.com>	2026-06-14 19:18:59 +08:00
wassname	b80d7778af	Add rotation-free S-space adapter cores (antipasto family) Replace antipasto's rotation/Cayley with a bounded 1+ELU gain and split the S-space idea into four interpretable PiSSA-style cores (frozen U/S/Vh, small trainable core): - antipasto: S_eff = S(1+ELU(coeffg)). exp-bounded attenuation, linear amplification (constant gradient, no runaway). g=0 -> exact identity. - antipasto_rot: keeps the block-Cayley rotation as a separate variant for cost comparison (its per-forward solve is the 72ms vs 36ms gap). - antipasto_ablate: contractive (I - a c c^T) diag(S), eigenvalues in [0,1], cannot blow up. Optional cov_orient (CorDA) basis. - antipasto_corda: covariance-oriented oblique projector P = Vh C^{-1/2}, the data-energy basis rather than the weight-gain basis. 1+ELU gain. Add scripts/_cost.py + scripts/cost_report.py: one-row-per-variant cost table (trainable params, peak GPU mem, fwd/bwd ms, added MACs/tok, group_init ms). Wire all four into the benchmark, smoke test, and __init__ exports. External review (DeepSeek-v4-pro, docs/reviews/) verified the math; acted on its one real point (corda g now inits to zeros for exact identity). Co-Authored-By: Claudypoo <noreply@anthropic.com>	2026-06-14 19:12:27 +08:00
wassname	56937e1b18	remove dead code: _road_matrix, callable(m) clause, silent git fallback - delete _road_matrix in variants/road.py (zero callers) - drop redundant callable(m) clause in is_linear_like (every nn.Module is callable) - remove try/except in current_git_commit so missing git crashes loudly instead of writing "unknown" into the results TSV Co-Authored-By: Claudypoo <noreply@anthropic.com>	2026-05-19 19:11:32 +08:00
wassname	f6fd410677	benchmark: antipasto rotate_basis CLI + lr=5e-3 + ablation queue	2026-04-27 16:29:25 +08:00
wassname	e624cd244f	feat: near_zero/near_one init for trainable params (breaks bf16 dead-grad symmetry) Trainable params that were init'd at exact 0 or 1 now use near_zero (N(0,1e-4)) or near_one (1 + N(0,1e-4)) to break bf16 symmetry without meaningfully breaking identity-at-t=0. Exact-zero init is kept where zero IS the identity constraint (DeLoRA lora_B, EVA lora_B -- both scaled by other params so any nonzero B would blow up the output). AntiPaSTO: delta_s and rot_T now near_zero. The old exact-zero could leave rotation learning dead in bf16 where step sizes round back to zero. IA3: lora_g now near_one instead of exact ones. Avoids the bf16 spacing issue around 1.0 where eps_bf16 ~ 7.8e-3 and lr=1e-3 updates were rounding away. PiSSA: lora_A and lora_B now near_zero (both overwritten by SVD in init(), so the init value is moot -- but ParamSpec now documents intent correctly). HRA: lora_U now near_zero (overwritten by symmetric init in init()). ParamSpec: added 'near_zero' and 'near_one' init modes. Default changed from 'zeros' to 'near_zero'. Tests relaxed identity tolerances accordingly.	2026-04-27 15:55:05 +08:00
wassname	0bd091fe5b	tidy	2026-04-27 11:44:40 +08:00
wassname	a342801807	wip	2026-04-27 11:24:19 +08:00
wassname	24ba8deb02	simpler test	2026-04-27 09:47:07 +08:00
wassname	727ef6ea73	tidy tests to subset of metamath	2026-04-27 09:20:07 +08:00
wassname	bb8887e66c	tidy	2026-04-27 07:12:56 +08:00
wassname	74c374e741	tidy, review	2026-04-27 07:03:24 +08:00
wassname	a81ed6ffaf	misc	2026-04-27 06:23:36 +08:00

15 Commits