Files
lora-lite/src/lora_lite/variants/ia3.py
T
wassname fdb4c77d6c Add reference-impl URLs to variant docstrings + V2 external review
- Fetch canonical reference impls for offline review:
  * peft_{lora,hra,delora,ia3}_layer.py + peft_lora_{dora,variants}.py
  * orig_pissa_init.py (MuLabPKU/PiSSA)
  * orig_hra_layer.py (DaShenZi721/HRA)
  * orig_delora.py (ExplainableML/DeLoRA author fork)
- Add reference-impl URLs to all 6 variant docstrings
- Document HRA gate=0 dead-grad issue and DoRA detach-omission in their docstrings
- Re-run external review (codex) with refs available -> docs/audit/variants_review_v2.md
  Major NEW findings vs paper-only review:
    * DeLoRA: scalar W.norm() should be per-input-channel norm(dim=0)
    * HRA: PEFT uses symmetric repeated-column init (no dead grad), not zero gate
    * IA3: FFN targets need input-side gating, not output, our up_proj advice wrong
    * All LoRA-family: cfg.dropout silently ignored (no-op)
    * DeLoRA: wnorm should be persistent buffer, not Parameter
  HRA and DeLoRA upgraded to BUGGY (from Partial)
2026-04-26 19:27:47 +08:00

45 lines
1.5 KiB
Python

"""IA3-style output gating. Liu et al. 2022 https://arxiv.org/abs/2205.05638
y_new = y * g, g initialized to 1 (identity at t=0)
DEVIATION FROM PAPER:
The original IA3 gates only three positions per transformer block:
l_k * (k_proj output), l_v * (v_proj output), l_ff * (FFN intermediate after activation)
This implementation gates ANY linear layer the targeting system selects.
To match the paper exactly on a typical Llama/Qwen-style block, attach with:
cfg = LoraLiteConfig(
variant="ia3",
target_names=(r"\\.k_proj$", r"\\.v_proj$", r"\\.up_proj$"),
target_roles=(),
)
`up_proj` is the closest stand-in for "FFN intermediate" in gated-MLP blocks
(Llama uses gate * up; gating the up branch is the IA3-spirit choice).
Reference implementations (for review/cross-check):
- peft IA3 layer (uses ia3_l elementwise scaling, fan_in_fan_out aware):
https://github.com/huggingface/peft/blob/main/src/peft/tuners/ia3/layer.py
(offline: docs/refs/peft_ia3_layer.py)
"""
import torch
from torch import nn
from ..variant import register, ParamSpec
@register
class IA3:
name = "ia3"
@staticmethod
def param_specs(d_in, d_out, cfg):
return {"lora_g": ParamSpec((d_out,), init="ones", trainable=True)}
@staticmethod
def init(layer: nn.Linear, cfg) -> None:
return
@staticmethod
def forward(layer: nn.Linear, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
return y * layer.lora_g