mirror of
https://github.com/wassname/lora-lite.git
synced 2026-06-27 16:45:56 +08:00
fdb4c77d6c
- Fetch canonical reference impls for offline review:
* peft_{lora,hra,delora,ia3}_layer.py + peft_lora_{dora,variants}.py
* orig_pissa_init.py (MuLabPKU/PiSSA)
* orig_hra_layer.py (DaShenZi721/HRA)
* orig_delora.py (ExplainableML/DeLoRA author fork)
- Add reference-impl URLs to all 6 variant docstrings
- Document HRA gate=0 dead-grad issue and DoRA detach-omission in their docstrings
- Re-run external review (codex) with refs available -> docs/audit/variants_review_v2.md
Major NEW findings vs paper-only review:
* DeLoRA: scalar W.norm() should be per-input-channel norm(dim=0)
* HRA: PEFT uses symmetric repeated-column init (no dead grad), not zero gate
* IA3: FFN targets need input-side gating, not output, our up_proj advice wrong
* All LoRA-family: cfg.dropout silently ignored (no-op)
* DeLoRA: wnorm should be persistent buffer, not Parameter
HRA and DeLoRA upgraded to BUGGY (from Partial)
45 lines
1.5 KiB
Python
45 lines
1.5 KiB
Python
"""IA3-style output gating. Liu et al. 2022 https://arxiv.org/abs/2205.05638
|
|
|
|
y_new = y * g, g initialized to 1 (identity at t=0)
|
|
|
|
DEVIATION FROM PAPER:
|
|
The original IA3 gates only three positions per transformer block:
|
|
l_k * (k_proj output), l_v * (v_proj output), l_ff * (FFN intermediate after activation)
|
|
This implementation gates ANY linear layer the targeting system selects.
|
|
To match the paper exactly on a typical Llama/Qwen-style block, attach with:
|
|
|
|
cfg = LoraLiteConfig(
|
|
variant="ia3",
|
|
target_names=(r"\\.k_proj$", r"\\.v_proj$", r"\\.up_proj$"),
|
|
target_roles=(),
|
|
)
|
|
|
|
`up_proj` is the closest stand-in for "FFN intermediate" in gated-MLP blocks
|
|
(Llama uses gate * up; gating the up branch is the IA3-spirit choice).
|
|
|
|
Reference implementations (for review/cross-check):
|
|
- peft IA3 layer (uses ia3_l elementwise scaling, fan_in_fan_out aware):
|
|
https://github.com/huggingface/peft/blob/main/src/peft/tuners/ia3/layer.py
|
|
(offline: docs/refs/peft_ia3_layer.py)
|
|
"""
|
|
import torch
|
|
from torch import nn
|
|
|
|
from ..variant import register, ParamSpec
|
|
|
|
|
|
@register
|
|
class IA3:
|
|
name = "ia3"
|
|
|
|
@staticmethod
|
|
def param_specs(d_in, d_out, cfg):
|
|
return {"lora_g": ParamSpec((d_out,), init="ones", trainable=True)}
|
|
|
|
@staticmethod
|
|
def init(layer: nn.Linear, cfg) -> None:
|
|
return
|
|
|
|
@staticmethod
|
|
def forward(layer: nn.Linear, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
|
|
return y * layer.lora_g |