mirror of https://github.com/wassname/lora-lite.git synced 2026-06-27 16:30:44 +08:00

Files

T

wassname d0b4c52740 External review: per-variant audit + design notes

- Two acpx external reviews (codex + opencode):
  * docs/audit/variants_review.md: per-variant paper-vs-impl audit
  * docs/audit/design_review.md: peft EVA / baukit / antipasto3 vs lora-lite
  * docs/audit/SUMMARY.md: aggregate verdicts + 3 risks + 5 follow-ups
- docs/refs/: peft_eva.py, peft_eva_finetuning.py, baukit_nethook.py,
  antipasto3_svd_adapter.py for offline reference

Findings: LoRA clean; PiSSA/DoRA/IA3/HRA/DeLoRA have documented partial deviations.
Top risks: init/grad tradeoffs hidden by coarse tests; qwen probe lacks strict
identity tol; IA3 target placement untested.

2026-04-26 19:01:29 +08:00

2.8 KiB

Raw Blame History

External-Review Summary

Two independent reviews via acpx external models. Full reviews:

docs/audit/variants_review.md — per-variant paper-faithfulness audit
docs/audit/design_review.md — peft EVA / baukit / antipasto3 vs lora-lite design

Per-variant verdict

variant	match	bugs found	confidence
lora	Y	none material	High
pissa	Partial	bf16/Qwen init err 0.31; deviation `alpha==r` only in inline comment; residual not in saved adapter	Medium
dora	Y	possible denominator-gradient mismatch with paper's "cost-saving" variant	High
ia3	Partial	targets q/v not paper's k/v/ffn-down; deviation documented but not tested	Medium
hra	Partial	gate=0 init -> dU/dx=0 first step (lora_U dead); not orthogonal when gate != 1	Medium-Low
delora	Partial	no Eq.9 frozen-copy init; lambda0=0 -> A/B dead grad; lambda0=0.1 breaks identity	Medium

Three biggest risks (reviewer's words)

Initialization vs gradient-flow tradeoffs are hidden by coarse tests. HRA's lora_U and DeLoRA's A/B can be initially dead while grad_nonzero=True still passes (because some lora_* param has nonzero grad).
Qwen probe pass criteria do not enforce paper identity. PiSSA shows id_err=0.31, DeLoRA id_err=0.72, but log says PASS.
Target semantics under-tested. IA3's documented k/v/ffn deviation is never exercised by a positive test.

Design recommendations

ref	verdict	impact
peft EVA	PARTIAL — add `calibrate(model, dataloader, cfg)` (~50 lines)	+50 lines, additive
baukit nethook	SKIP — current 5-line hook registration is simpler	0
antipasto3 SVD	ADOPT concept (learnable delta_s) — no code change now	0

Recommended follow-up tasks (need user approval before implementing)

A. Per-param gradient probe: extend smoke to assert grad on each lora_* param at step 0. Catches HRA/DeLoRA init-dead-param bug.

B. Per-variant identity tolerance in qwen probe: PiSSA/DeLoRA need a stricter check (or relative tol against ||y_base||) instead of "passes if id_err < some constant".

C. IA3 paper-faithful test row: add one Qwen probe configuration with target_names=k_proj|v_proj|down_proj to exercise the documented IA3 placement.

D. PiSSA equivalence test against peft.PiSSA: same seed + alpha=r, compare B@A reconstruction. Adds peft to test extras only.

E. EVA variant: implement minimal calibrate() per design review (~50 lines). Optional, but provides our first data-driven init variant for the user's stated interest.

2.8 KiB Raw Blame History