Commit Graph

13 Commits

Author SHA1 Message Date
copilot 55757e829d fix V3 review must-fixes: DoRA bias passthrough + EVA load path
V3 external review (docs/audit/variants_review_v3.md, 97KB) found 3
must-fix bugs.

DoRA: bias was being scaled by m/||V|| because we operated on the full
base layer output. Now subtract bias before normalization, add back
after. Matches peft DoRA exactly (docs/refs/peft_lora_dora.py:157-161).
New smoke dora_bias_smoke verifies identity at t=0 with bias=True.

EVA load: adapter.load() called attach() which called group_init() which
required calibration_data and raised. Added _skip_group_init flag to
attach(); load() passes it. EVA group_init still raises loudly when
called directly without data. New smoke verifies save+load WITHOUT
calibration data on load path.

Also tightened EVA error message.

Smoke now covers 8 variants + EVA roundtrip + DoRA-bias roundtrip + bnb
4/8-bit. ALL PASS.

V3 nice-to-haves (PiSSA scaling, AntiPaSTO init choice, stale GH refs)
deferred -- documented as intentional in module docstrings.
2026-04-26 19:50:48 +08:00
copilot 185eb29c70 fix v2 review bugs + add EVA, AntiPaSTO
DeLoRA: per-input-channel wnorm buffer (not scalar Parameter), forward
matches peft (x*wnorm @ A.T then per-rank scale (lambda/r)/(An*Bn)).
Smoke: 89.7% loss drop (was 35.8%).

HRA: symmetric repeated-column init (PEFT-style) instead of zero gate.
Adjacent Householder pairs cancel exactly so R=I at t=0, and U receives
gradient from step 0 (no dead-grad). Even r required.

IA3: split into two variants. ia3 stays output-side (k_proj/v_proj);
new ia3_ff is input-side (down_proj/fc2), matching peft is_feedforward.

Config: dropout field removed (never honored by any variant).

PiSSA: adapter.save records base-weight fingerprint per target;
adapter.load recomputes init then verifies fingerprint -> fails loud
when reloaded onto a different base.

EVA (new): data-driven init via group_init + calibration_data. Top-r
right singular vectors of pooled layer-input activations -> lora_A
(buffer, frozen); only lora_B trains. Stress-tests group_init API.

AntiPaSTO (new): SVD steering with frozen U,S,Vh,W_res and learnable
delta_s (per-singular-value bias) + rot_T (block-diagonal Cayley
rotation on V or U). Lite port of antipasto3 SVD adapter.

ParamSpec: as_buffer field + make_tensor() for buffer registration.
adapter.attach honors as_buffer with register_buffer; detach cleans
both _parameters and _buffers.

Smoke covers all 8 variants: identity at t=0, save/load round-trip,
gradient-driven loss drop. EVA gets dedicated test for calibration
data path. ALL PASS including bnb 4/8-bit path.
2026-04-26 19:41:59 +08:00
wassname fdb4c77d6c Add reference-impl URLs to variant docstrings + V2 external review
- Fetch canonical reference impls for offline review:
  * peft_{lora,hra,delora,ia3}_layer.py + peft_lora_{dora,variants}.py
  * orig_pissa_init.py (MuLabPKU/PiSSA)
  * orig_hra_layer.py (DaShenZi721/HRA)
  * orig_delora.py (ExplainableML/DeLoRA author fork)
- Add reference-impl URLs to all 6 variant docstrings
- Document HRA gate=0 dead-grad issue and DoRA detach-omission in their docstrings
- Re-run external review (codex) with refs available -> docs/audit/variants_review_v2.md
  Major NEW findings vs paper-only review:
    * DeLoRA: scalar W.norm() should be per-input-channel norm(dim=0)
    * HRA: PEFT uses symmetric repeated-column init (no dead grad), not zero gate
    * IA3: FFN targets need input-side gating, not output, our up_proj advice wrong
    * All LoRA-family: cfg.dropout silently ignored (no-op)
    * DeLoRA: wnorm should be persistent buffer, not Parameter
  HRA and DeLoRA upgraded to BUGGY (from Partial)
2026-04-26 19:27:47 +08:00
wassname d0b4c52740 External review: per-variant audit + design notes
- Two acpx external reviews (codex + opencode):
  * docs/audit/variants_review.md: per-variant paper-vs-impl audit
  * docs/audit/design_review.md: peft EVA / baukit / antipasto3 vs lora-lite
  * docs/audit/SUMMARY.md: aggregate verdicts + 3 risks + 5 follow-ups
- docs/refs/: peft_eva.py, peft_eva_finetuning.py, baukit_nethook.py,
  antipasto3_svd_adapter.py for offline reference

Findings: LoRA clean; PiSSA/DoRA/IA3/HRA/DeLoRA have documented partial deviations.
Top risks: init/grad tradeoffs hidden by coarse tests; qwen probe lacks strict
identity tol; IA3 target placement untested.
2026-04-26 19:01:29 +08:00
wassname 7eeaeed206 Verify all variants on bnb 4bit/8bit; HRA paper-faithful rewrite
- Test all 6 variants against bnb.Linear8bitLt + Linear4bit in smoke
- bnb-friendly (LoRA, IA3, HRA, DeLoRA): identity err <= 2.4e-4
- bnb-incompatible (PiSSA, DoRA): fail-loud TypeError as expected
- HRA: rewrite to paper-faithful input-side reflections (h <- (I-2vv^T)h),
  fixing previous broken output-side formulation
- IA3: bypass dtype upcast for bnb (params stay fp16/quantized)
- DeLoRA: explicit type check rejecting non-nn.Linear (incl. bnb)
- adapter: special-case bnb param assignment via .data
- Re-verified Qwen0.6B HRA probe: drop=20.7%, id_err=0, reload=0
2026-04-26 18:08:06 +08:00
wassname 0d929f93b3 feat(hra): add Householder Reflection Adaptation, hook-only/bnb-friendly + Qwen proof 2026-04-26 17:58:56 +08:00
wassname 43e620176c docs: record DoRA + IA3 Qwen-0.6B proof results (tasks 80, 81) 2026-04-26 17:54:54 +08:00
wassname 2abf616be6 feat(dora): add weight-decomposed LoRA variant for fp layers 2026-04-26 17:53:33 +08:00
wassname 699fde31bf feat: ia3 variant, real bnb 4bit/8bit smoke, dev guide split, user-only readme 2026-04-26 17:49:17 +08:00
wassname f2d9021511 ci: add publishable check workflow 2026-04-26 17:09:47 +08:00
wassname 69bf5f4e44 test: prove adapter training paths 2026-04-26 17:00:39 +08:00
wassname 4db5cee5a9 init 2026-04-26 14:10:20 +08:00
wassname de97724b65 init 2026-04-26 14:10:18 +08:00