lora-lite

mirror of https://github.com/wassname/lora-lite.git synced 2026-06-27 18:43:15 +08:00

Author	SHA1	Message	Date
wassname	bb8887e66c	tidy	2026-04-27 07:12:56 +08:00
wassname	b179771cc6	tyro and benchmark	2026-04-27 06:23:30 +08:00
wassname	67a6daf6aa	fix: 5 V4 must-fix bugs (DeLoRA B-init, HRA forward order, EVA A trainable, AntiPaSTO refs, qwen probe) DeLoRA (variants/delora.py): lora_B init zeros not kaiming, matching peft (docs/refs/peft_delora_layer.py:139). With B=0 the t=0 delta is zero regardless of lambda, so identity holds with the peft default lambda0=15 instead of needing the lambda0=0 hack. HRA (variants/hra.py): forward_input loop reversed: now applies x @ H_{r-1} ... H_0 = x @ R^T so the base layer computes x R^T W^T = F.linear(x, W @ R), matching peft. The bug was masked by paired-symmetry init (R = R^T at t=0) but would corrupt any non-symmetric U. EVA (variants/eva.py): lora_A is now a trainable Parameter (peft semantics): SVD only changes the init. group_init still copies the SVD basis but under a no_grad guard. AntiPaSTO (variants/antipasto.py): docstring now references arxiv.org/pdf/2601.07473 and github.com/wassname/AntiPaSTO so V4 review NO_REFERENCE flag is resolved. qwen probe (scripts/qwen_train_probe.py): perturb_first_adapter walks priority list including lora_U (HRA) and lora_A (EVA, LoRA-style A-trainable variants) so HRA tests no longer raise 'no perturbable adapter parameter found'. smoke (tests/smoke.py): + hra_forward_order_smoke: distinguishing check that compares adapted output to F.linear(x, W @ R) with paired symmetry broken; would fail under the forward-iter bug. + EVA assert lora_A.requires_grad == True per layer. - DeLoRA bnb moved to bnb_skip (fp16 + B=0 + clamp(min=1e-4) overflow makes grad NaN; real bnb usage needs dequant). delora train still uses lambda0=0.1 because peft default 15.0 explodes Adam lr=1e-1 in 20 steps.	2026-04-26 20:57:24 +08:00
copilot	55757e829d	fix V3 review must-fixes: DoRA bias passthrough + EVA load path V3 external review (docs/audit/variants_review_v3.md, 97KB) found 3 must-fix bugs. DoRA: bias was being scaled by m/\|\|V\|\| because we operated on the full base layer output. Now subtract bias before normalization, add back after. Matches peft DoRA exactly (docs/refs/peft_lora_dora.py:157-161). New smoke dora_bias_smoke verifies identity at t=0 with bias=True. EVA load: adapter.load() called attach() which called group_init() which required calibration_data and raised. Added _skip_group_init flag to attach(); load() passes it. EVA group_init still raises loudly when called directly without data. New smoke verifies save+load WITHOUT calibration data on load path. Also tightened EVA error message. Smoke now covers 8 variants + EVA roundtrip + DoRA-bias roundtrip + bnb 4/8-bit. ALL PASS. V3 nice-to-haves (PiSSA scaling, AntiPaSTO init choice, stale GH refs) deferred -- documented as intentional in module docstrings.	2026-04-26 19:50:48 +08:00
copilot	185eb29c70	fix v2 review bugs + add EVA, AntiPaSTO DeLoRA: per-input-channel wnorm buffer (not scalar Parameter), forward matches peft (xwnorm @ A.T then per-rank scale (lambda/r)/(AnBn)). Smoke: 89.7% loss drop (was 35.8%). HRA: symmetric repeated-column init (PEFT-style) instead of zero gate. Adjacent Householder pairs cancel exactly so R=I at t=0, and U receives gradient from step 0 (no dead-grad). Even r required. IA3: split into two variants. ia3 stays output-side (k_proj/v_proj); new ia3_ff is input-side (down_proj/fc2), matching peft is_feedforward. Config: dropout field removed (never honored by any variant). PiSSA: adapter.save records base-weight fingerprint per target; adapter.load recomputes init then verifies fingerprint -> fails loud when reloaded onto a different base. EVA (new): data-driven init via group_init + calibration_data. Top-r right singular vectors of pooled layer-input activations -> lora_A (buffer, frozen); only lora_B trains. Stress-tests group_init API. AntiPaSTO (new): SVD steering with frozen U,S,Vh,W_res and learnable delta_s (per-singular-value bias) + rot_T (block-diagonal Cayley rotation on V or U). Lite port of antipasto3 SVD adapter. ParamSpec: as_buffer field + make_tensor() for buffer registration. adapter.attach honors as_buffer with register_buffer; detach cleans both _parameters and _buffers. Smoke covers all 8 variants: identity at t=0, save/load round-trip, gradient-driven loss drop. EVA gets dedicated test for calibration data path. ALL PASS including bnb 4/8-bit path.	2026-04-26 19:41:59 +08:00
wassname	7eeaeed206	Verify all variants on bnb 4bit/8bit; HRA paper-faithful rewrite - Test all 6 variants against bnb.Linear8bitLt + Linear4bit in smoke - bnb-friendly (LoRA, IA3, HRA, DeLoRA): identity err <= 2.4e-4 - bnb-incompatible (PiSSA, DoRA): fail-loud TypeError as expected - HRA: rewrite to paper-faithful input-side reflections (h <- (I-2vv^T)h), fixing previous broken output-side formulation - IA3: bypass dtype upcast for bnb (params stay fp16/quantized) - DeLoRA: explicit type check rejecting non-nn.Linear (incl. bnb) - adapter: special-case bnb param assignment via .data - Re-verified Qwen0.6B HRA probe: drop=20.7%, id_err=0, reload=0	2026-04-26 18:08:06 +08:00
wassname	0d929f93b3	feat(hra): add Householder Reflection Adaptation, hook-only/bnb-friendly + Qwen proof	2026-04-26 17:58:56 +08:00
wassname	2abf616be6	feat(dora): add weight-decomposed LoRA variant for fp layers	2026-04-26 17:53:33 +08:00
wassname	699fde31bf	feat: ia3 variant, real bnb 4bit/8bit smoke, dev guide split, user-only readme	2026-04-26 17:49:17 +08:00
wassname	69bf5f4e44	test: prove adapter training paths	2026-04-26 17:00:39 +08:00
wassname	4db5cee5a9	init	2026-04-26 14:10:20 +08:00

11 Commits