mirror of
https://github.com/wassname/lora-lite.git
synced 2026-06-27 17:48:59 +08:00
185eb29c70
DeLoRA: per-input-channel wnorm buffer (not scalar Parameter), forward matches peft (x*wnorm @ A.T then per-rank scale (lambda/r)/(An*Bn)). Smoke: 89.7% loss drop (was 35.8%). HRA: symmetric repeated-column init (PEFT-style) instead of zero gate. Adjacent Householder pairs cancel exactly so R=I at t=0, and U receives gradient from step 0 (no dead-grad). Even r required. IA3: split into two variants. ia3 stays output-side (k_proj/v_proj); new ia3_ff is input-side (down_proj/fc2), matching peft is_feedforward. Config: dropout field removed (never honored by any variant). PiSSA: adapter.save records base-weight fingerprint per target; adapter.load recomputes init then verifies fingerprint -> fails loud when reloaded onto a different base. EVA (new): data-driven init via group_init + calibration_data. Top-r right singular vectors of pooled layer-input activations -> lora_A (buffer, frozen); only lora_B trains. Stress-tests group_init API. AntiPaSTO (new): SVD steering with frozen U,S,Vh,W_res and learnable delta_s (per-singular-value bias) + rot_T (block-diagonal Cayley rotation on V or U). Lite port of antipasto3 SVD adapter. ParamSpec: as_buffer field + make_tensor() for buffer registration. adapter.attach honors as_buffer with register_buffer; detach cleans both _parameters and _buffers. Smoke covers all 8 variants: identity at t=0, save/load round-trip, gradient-driven loss drop. EVA gets dedicated test for calibration data path. ALL PASS including bnb 4/8-bit path.