mirror of https://github.com/wassname/lora-lite.git synced 2026-06-27 18:05:16 +08:00

Files

T

wassname fdb4c77d6c Add reference-impl URLs to variant docstrings + V2 external review

- Fetch canonical reference impls for offline review:
  * peft_{lora,hra,delora,ia3}_layer.py + peft_lora_{dora,variants}.py
  * orig_pissa_init.py (MuLabPKU/PiSSA)
  * orig_hra_layer.py (DaShenZi721/HRA)
  * orig_delora.py (ExplainableML/DeLoRA author fork)
- Add reference-impl URLs to all 6 variant docstrings
- Document HRA gate=0 dead-grad issue and DoRA detach-omission in their docstrings
- Re-run external review (codex) with refs available -> docs/audit/variants_review_v2.md
  Major NEW findings vs paper-only review:
    * DeLoRA: scalar W.norm() should be per-input-channel norm(dim=0)
    * HRA: PEFT uses symmetric repeated-column init (no dead grad), not zero gate
    * IA3: FFN targets need input-side gating, not output, our up_proj advice wrong
    * All LoRA-family: cfg.dropout silently ignored (no-op)
    * DeLoRA: wnorm should be persistent buffer, not Parameter
  HRA and DeLoRA upgraded to BUGGY (from Partial)

2026-04-26 19:27:47 +08:00

85 KiB

Raw Blame History

[client] initialize (running)

[client] session/new (running) I’ll use the gsd workflow here since this is a multi-step audit, and I’ll keep it review-only: read refs, compare implementation, then produce the markdown report to stdout.

[tool] Read SKILL.md (running) input: {"call_id":"call_4LnMf0WHkHoYdG0a5N21Rrgd","process_id":"77034","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","sed -n '1,220p' /home/wassname/.claude/skills/gsd/SKILL.md"],"cwd":"/me... files: /home/wassname/.claude/skills/gsd/SKILL.md

[tool] Read SKILL.md (completed) kind: read input: {"call_id":"call_4LnMf0WHkHoYdG0a5N21Rrgd","process_id":"77034","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","sed -n '1,220p' /home/wassname/.claude/skills/gsd/SKILL.md"],"cwd":"/me... files: /home/wassname/.claude/skills/gsd/SKILL.md output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) --- name: gsd description: "Get Shit Done workflow for non-trivial tasks: spec -> implement -> test -> review -> wrap. Mandatory for anything with more than one step." ---

<!--
Design rationale and sources — for human readers, not the agent.

- GSD spec-first: spec-driven development, scope fencing, verify-before-done
- Manus file-as-memory: one file is working memory, checklist is state, append-only log
- Boomerang: return concrete artifacts not summaries
- DCP: invisible dedup, superseded writes, error purging

Core principles:
- Spec file is the only persistent artifact (single-file invariant).
- Agent shouldn't know about its own resource limits.
- Review scales with input size: frontier for plans, cheap for bulk.
- Observable outputs: show evidence, not narration.
-->

# GSD (Get Shit Done)

Mandatory for all non-trivial tasks (more than one step). Skip only for single-line fixes.

## Part 1: Core workflow

### 1.1 Spec-first planning
... (192 more lines)

I’m going to gather the exact line-numbered evidence first: prior verdicts, our variant code, plumbing, and the reference implementations. The main thing to avoid is smearing paper-faithfulness and reference-faithfulness together.

[tool] Read variants_review.md (running) input: {"call_id":"call_ZMecpefRlTCdKyqbYXoIQ4dq","process_id":"78042","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/audit/variants_review.md | sed -n '1,260p'"],"cwd":"/media/... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/audit/variants_review.md

[tool] Run for f in src/lora_lite/{adapter.py,target.py,variant.py,config.py}; do printf '\n### %s\n' "$f"; nl -ba "$f" | sed -n '1,280p'; done (running) input: {"call_id":"call_ToqQP8XSZ8r22VF9IYDGRrHF","process_id":"51617","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","for f in src/lora_lite/{adapter.py,target.py,variant.py,config.py}; do ...

[tool] Run for f in src/lora_lite/variants/{lora,pissa,dora,ia3,hra,delora}.py; do printf '\n### %s\n' "$f"; nl -ba "$f" | sed -n '1,260p'; done (running) input: {"call_id":"call_i9nfOVjK5lGEtnZXiX8w3UVI","process_id":"66364","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","for f in src/lora_lite/variants/{lora,pissa,dora,ia3,hra,delora}.py; do...

[tool] Read variants_review.md (completed) kind: read input: {"call_id":"call_ZMecpefRlTCdKyqbYXoIQ4dq","process_id":"78042","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/audit/variants_review.md | sed -n '1,260p'"],"cwd":"/media/... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/audit/variants_review.md output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 1 [client] initialize (running) 2 3 [client] session/new (running) 4 I’ll use the gsd skill because this is a multi-step audit, then I’ll read the variant code, paper excerpts, logs, and refs only as needed. No code edits, just a verdict with quoted evidence. 5 6 [tool] Read SKILL.md (running) 7 input: {"call_id":"call_ZJ8jUrIxVXwLtDtWUHoN5Nyw","process_id":"8344","turn_id":"019dc96e-e7cd-79f2-b04d-65a1c4d26637","command":["/usr/bin/zsh","-lc","sed -n '1,220p' /home/wassname/.claude/skills/gsd/SKILL.md"],"cwd":"/med... 8 files: /home/wassname/.claude/skills/gsd/SKILL.md 9 10 [tool] Read SKILL.md (completed) 11 kind: read 12 input: {"call_id":"call_ZJ8jUrIxVXwLtDtWUHoN5Nyw","process_id":"8344","turn_id":"019dc96e-e7cd-79f2-b04d-65a1c4d26637","command":["/usr/bin/zsh","-lc","sed -n '1,220p' /home/wassname/.claude/skills/gsd/SKILL.md"],"cwd":"/med... 13 files: /home/wassname/.claude/skills/gsd/SKILL.md 14 output: 15 sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 16 --- 17 name: gsd 18 description: "Get Shit Done workflow for non-trivial tasks: spec -> implement -> test -> review -> wrap. Mandatory for anything with more than one step." 19 --- 20
21 <!-- 22 Design rationale and sources — for human readers, not the agent. 23
24 - GSD spec-first: spec-driven development, scope fencing, verify-before-done 25 - Manus file-as-memory: one file is working memory, checklist is state, append-only log 26 - Boomerang: return concrete artifacts not summaries 27 - DCP: invisible dedup, superseded writes, error purging ... (233 more lines)

[tool] Run for f in src/lora_lite/{adapter.py,target.py,variant.py,config.py}; do printf '\n### %s\n' "$f"; nl -ba "$f" | sed -n '1,280p'; done (completed) kind: execute input: {"call_id":"call_ToqQP8XSZ8r22VF9IYDGRrHF","process_id":"51617","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","for f in src/lora_lite/{adapter.py,target.py,variant.py,config.py}; do ... output: ### src/lora_lite/adapter.py sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 1 """attach / detach / save / load. The whole runtime.""" 2 from future import annotations 3 import torch 4 from torch import nn 5 from torch.utils.hooks import RemovableHandle 6 7 from .config import LoraLiteConfig 8 from .variant import REGISTRY 9 from .target import find_targets 10 11 12 _ATTACHED_ATTR = "_lora_lite_attached" 13 14 15 def _hook(layer, args, y): 16 (x,) = args 17 cfg: LoraLiteConfig = layer._lora_cfg 18 x_cast = x.to(cfg.dtype) 19 out = layer._lora_variant.forward(layer, x_cast, y) 20 return out.to(y.dtype) 21 22 23 def _pre_hook(layer, args): 24 (x,) = args 25 cfg: LoraLiteConfig = layer._lora_cfg 26 x_cast = x.to(cfg.dtype) ... (250 more lines)

[tool] Run for f in src/lora_lite/variants/{lora,pissa,dora,ia3,hra,delora}.py; do printf '\n### %s\n' "$f"; nl -ba "$f" | sed -n '1,260p'; done (completed) kind: execute input: {"call_id":"call_i9nfOVjK5lGEtnZXiX8w3UVI","process_id":"66364","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","for f in src/lora_lite/variants/{lora,pissa,dora,ia3,hra,delora}.py; do... output: ### src/lora_lite/variants/lora.py sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 1 """Vanilla LoRA. Hu et al. 2021 https://arxiv.org/abs/2106.09685 2 3 h = W x + (alpha/r) B A x 4 5 Identity at t=0 from B=0. Faithful to the paper. 6 7 Reference implementations (for review/cross-check): 8 - peft Linear.update_layer + lora_A/B init, forward: 9 https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/layer.py 10 (see docs/refs/peft_lora_layer.py for offline copy) 11 """ 12 from einops import einsum 13 from torch import nn 14 import torch 15 16 from ..variant import register, ParamSpec 17 18 19 @register 20 class LoRA: 21 name = "lora" 22 23 @staticmethod 24 def param_specs(d_in, d_out, cfg): 25 return { 26 "lora_A": ParamSpec((cfg.r, d_in), init="kaiming", trainable=True), ... (355 more lines) The prior review file appears to include a captured transcript at the top, so I’ll isolate the actual verdict lines with search rather than assuming the whole file is clean markdown. Next I’m pulling focused reference snippets around the implementations.

[tool] Search Verdict|BUG|HRA|DeLoRA|IA3|PiSSA|LoRA|DoRA|prior verdict|new verdict|Faithful|Partial|Buggy in variants_review.md (running) input: {"call_id":"call_e6cm2JLiyHQ9BWA2TVDwczBs","process_id":"84327","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","rg -n "Verdict|BUG|HRA|DeLoRA|IA3|PiSSA|LoRA|DoRA|prior verdict|new ve...

[tool] Search class Linear|def update_layer|lora_A|lora_B|scaling|reset_lora_parameters|pissa_init|subtract_mutated_init|forward(|magnitude|dora|use_dora|save|load in peft_lora_layer.py (running) input: {"call_id":"call_M0zQbm2NDBihQ79YxcPRZgiK","process_id":"9967","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","rg -n "class Linear|def update_layer|lora_A|lora_B|scaling|reset_lora_p...

[tool] Search class Linear|hra|HRA|apply_GS|hra_u|lora|forward|Householder|reflect|lambda|reset|Parameter|orthogonal|init|scaling in peft_hra_layer.py (running) input: {"call_id":"call_LCfRNo5JdxbNAhjTLXS0Rrzx","process_id":"59701","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","rg -n "class Linear|hra|HRA|apply_GS|hra_u|lora|forward|Householder|re...

[tool] Search Verdict|BUG|HRA|DeLoRA|IA3|PiSSA|LoRA|DoRA|prior verdict|new verdict|Faithful|Partial|Buggy in variants_review.md (completed) kind: search input: {"call_id":"call_e6cm2JLiyHQ9BWA2TVDwczBs","process_id":"84327","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","rg -n "Verdict|BUG|HRA|DeLoRA|IA3|PiSSA|LoRA|DoRA|prior verdict|new ve... output: 151: 1 """Vanilla LoRA. Hu et al. 2021 https://arxiv.org/abs/2106.09685 155: 5 Identity at t=0 from B=0. Faithful to the paper. 165: 15 class LoRA: 186: 1 """PiSSA: top-r SVD of W into A,B; replace W with W_res = W - B@A. 199: 14 class PiSSA: 213: 28 "PiSSA mutates layer.weight into W_res, so v1 only supports plain nn.Linear. " 229: 1 """IA3-style output gating. Liu et al. 2022 https://arxiv.org/abs/2205.05638 234: 6 The original IA3 gates only three positions per transformer block: 246: 18 (Llama uses gate * up; gating the up branch is the IA3-spirit choice). 255: 27 class IA3: 264: 1 """DoRA: weight-decomposed LoRA. Liu et al. 2024 https://arxiv.org/abs/2402.09353 282: 19 class DoRA: 315: 1 """DeLoRA: column-normalised A, B, scaled by lambda * ||W||F / r. 340: 26 class DeLoRA: 385: 1 """HRA: Householder Reflection Adaptation. Yuan et al. 2024 https://arxiv.org/abs/2405.17484 403: 19 you want the regularized HRA variant. 451:[tool] Search PiSSA|SVD|singular|residual|A =|B =|W_res|Eq|equation|principal|initialize|initial in pissa.txt (running) 452: input: {"call_id":"call_E9ZDiQW4LYUvXfsbLERUR8VE","process_id":"40256","turn_id":"019dc96e-e7cd-79f2-b04d-65a1c4d26637","command":["/usr/bin/zsh","-lc","rg -n "PiSSA|SVD|singular|residual|A =|B =|W_res|Eq|equation|principal... 454:[tool] Search IA3|l_k|l_v|l_ff|key|value|feed-forward|activation|element-wise|Eq|equation|rescale|learned vector in ia3_.txt (running) 455: input: {"call_id":"call_g8eWpsF9Wv2q0sr9uIOTKAXt","process_id":"6527","turn_id":"019dc96e-e7cd-79f2-b04d-65a1c4d26637","command":["/usr/bin/zsh","-lc","rg -n "IA3|l_k|l_v|l_ff|key|value|feed-forward|activation|element-wise|... 457:[tool] Search LoRA|low-rank|A|B|alpha|rank|Eq|equation|ΔW|Delta|zero|Gaussian|Kaiming|initial in lora_*.txt (running) 458: input: {"call_id":"call_bWnP0o9dE2sMNEN6N5a9w6Jp","process_id":"2585","turn_id":...

[tool] Search class Linear|def kind: search input: {"call_id":"call_ output: docs/refs/peft_lora_variants.py:29:from docs/refs/peft_lora_variants.py:38: docs/refs/peft_lora_variants.py:41: docs/refs/peft_lora_variants.py:42: docs/refs/peft_lora_variants.py:43: docs/refs/peft_lora_variants.py:49: docs/refs/peft_lora_variants.py:71: docs/refs/peft_lora_variants.py:98: docs/refs/peft_lora_variants.py:101: docs/refs/peft_lora_variants.py:107: docs/refs/peft_lora_variants.py:108: docs/refs/peft_lora_variants.py:110: docs/refs/peft_lora_variants.py:136: docs/refs/peft_lora_variants.py:137: docs/refs/peft_lora_variants.py:138: docs/refs/peft_lora_variants.py:140: docs/refs/peft_lora_variants.py:141: docs/refs/peft_lora_variants.py:142: docs/refs/peft_lora_variants.py:143... update_layer|lora_A|lora_B|scaling|reset_lora_parameters|pissa_init|subtract_mutated_init|forward(|magnitude|dora|use_dora|save|load in peft_lora_layer.py (completed) M0zQbm2NDBihQ79YxcPRZgiK","process_id":"9967","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","rg -n "class Linear|def update_layer|lora_A|lora_B|scaling|reset_lora_p... .dora import DoraConv1dLayer, DoraConv2dLayer, DoraConv3dLayer, DoraEmbeddingLayer, DoraLinearLayer for ArrowLoraLinearLayer(). A layer of the base model with LoRA adapter loaded on it will be like: (base_layer): Linear4bit or Linear (lora_dropout): ModuleDict( ... ) (lora_A): ModuleDict( ... ) (lora_B): ModuleDict( ... ) (lora_embedding_A): ParameterDict( ... ) (lora_embedding_B): ParameterDict( ... ) (lora_magnitude_vector): ModuleDict( ... ) (lora_arrow): ModuleDict( module (Linear): LoRA Layer of the model, containing base_layer, lora_A, lora_B, etc. def forward( arrow.gen_know_sub(module.lora_A, module.lora_B) arrow.build_prototypes(module.lora_A, module.lora_B) lora_A=module.lora_A, lora_B=module.lora_B, scaling=module.scaling, if not module.lora_magnitude_vector: # first dora layer being added, add lora_magnitude_vector to the list of learnable parameters module.adapter_layer_names = module.adapter_layer_names[:] + ("lora_magnitude_vector",) dora_layer = DoraLinearLayer(fan_in_fan_out=getattr(module, "fan_in_fan_out", False)) lora_A = module.lora_A[adapter_name].weight lora_B = module.lora_B[adapter_name].weight

[tool] Search class Linear|ia3_l|feedforward|is_feedforward|forward(|update_layer|reset_ia3_parameters|target_modules|save|load|Parameter|ones in peft_ia3_layer.py (completed) kind: search input: {"call_id":"call_mXxKBEh7P0d4sdJjx6BqGFXB","process_id":"17225","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","rg -n "class Linear|ia3_l|feedforward|is_feedforward|forward\(|update... output: 30: adapter_layer_names = ("ia3_l",) 32: def init(self, base_layer: nn.Module, is_feedforward: bool, **kwargs) -> None: 34: self.ia3_l = nn.ParameterDict({}) 38: self.is_feedforward = is_feedforward 56: def update_layer(self, adapter_name: str, config: IA3Config, **kwargs): 62: if self.is_feedforward: 66: self.ia3_l[adapter_name] = nn.Parameter(weight) 68: self.reset_ia3_parameters(adapter_name) 72: def reset_ia3_parameters(self, adapter_name): 73: if adapter_name in self.ia3_l.keys(): 74: # initialize learned vector with torch.ones 75: nn.init.constant_(self.ia3_l[adapter_name], 1.0) 78:class Linear(nn.Module, IA3Layer): 85: is_feedforward: bool = False, # Set to True if the layer is treated as a feedforward layer 86: is_target_conv_1d_layer: bool = False, # whether target module is a conv1d layer. useful while unloading later 90: IA3Layer.init(self, base_layer, is_feedforward=is_feedforward) 94: self.update_layer(adapter_name, config=config) 115: if active_adapter in self.ia3_l.keys(): 117: ia3_l = transpose(self.ia3_l[active_adapter].data, self.fan_in_fan_out) 121: orig_weights = torch.mul(orig_weights, ia3_l) 129: base_layer.weight.data = torch.mul(base_layer.weight.data, ia3_l).to(orig_dtype) 131: if not self.is_feedforward and (base_layer.bias is not None): 132: scaling = self.ia3_l[active_adapter].reshape(base_layer.bias.shape) 149: if active_adapter in self.ia3_l.keys(): 152: ia3_l = transpose(self.ia3_l[active_adapter].data, self.fan_in_fan_out) + 1e-8 154: base_layer.weight.data = torch.div(base_layer.weight.data, ia3_l).to(orig_dtype) 156: if not self.is_feedforward and (base_layer.bias is not None): 157: scaling = self.ia3_l[active_adapter].reshape(base_layer.bias.s...

[tool] Search class Linear|delora|lambda|lora_A|lora_B|forward|reset|norm|frozen|init|Parameter|scaling|weight_norm|save|load|dora in peft_delora_layer.py (completed) kind: search input: {"call_id":"call_xWW1G18NS65tbdAzP1zglU4E","process_id":"4667","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","rg -n "class Linear|delora|lambda|lora_A|lora_B|forward|reset|norm|froz... output: docs/refs/peft_delora_layer.py:32: "delora_A", docs/refs/peft_delora_layer.py:33: "delora_B", docs/refs/peft_delora_layer.py:34: "delora_lambda", docs/refs/peft_delora_layer.py:39: "delora_dropout", docs/refs/peft_delora_layer.py:40: "delora_w_norm", docs/refs/peft_delora_layer.py:43: def init(self, base_layer: nn.Module, **kwargs) -> None: docs/refs/peft_delora_layer.py:46: self.delora_dropout = nn.ModuleDict({}) docs/refs/peft_delora_layer.py:47: self.delora_A = nn.ParameterDict({}) docs/refs/peft_delora_layer.py:48: self.delora_B = nn.ParameterDict({}) docs/refs/peft_delora_layer.py:49: self.delora_lambda = nn.ParameterDict({}) docs/refs/peft_delora_layer.py:50: # Use persistent buffers so they are included in state_dict and saved. docs/refs/peft_delora_layer.py:51: self.delora_w_norm = BufferDict({}, persistent=True) docs/refs/peft_delora_layer.py:65: A: torch.Tensor, B: torch.Tensor, delora_lambda: torch.Tensor, r: int, w_norm: torch.Tensor docs/refs/peft_delora_layer.py:67: """Compute delta = B @ diag(delora_lambda/r / (||A_i||*||B^j||)) @ A, scaled by provided w_norm (per-input channel)""" docs/refs/peft_delora_layer.py:68: An = torch.clamp(A.norm(dim=1), min=1e-4) docs/refs/peft_delora_layer.py:69: Bn = torch.clamp(B.norm(dim=0), min=1e-4) docs/refs/peft_delora_layer.py:70: diag = torch.diag_embed(delora_lambda / r / (An * Bn)) docs/refs/peft_delora_layer.py:72: delta = delta * w_norm.unsqueeze(0) docs/refs/peft_delora_layer.py:76: if adapter not in self.delora_A or adapter not in self.delora_B: docs/refs/peft_delora_layer.py:80: self.delora_A[adapter], docs/refs/peft_delora_layer.py:81: self.delora_B[adapter], docs/refs/peft_delora_layer.py:82: self.delora_lambda[adapter], docs/refs/peft_delora_layer.py:84: self.delora_w_norm[adapter], docs/refs/peft_delora_layer.py:92: del...

[tool] Search class Linear|hra|HRA|apply_GS|hra_u|lora|forward|Householder|reflect|lambda|reset|Parameter|orthogonal|init|scaling in peft_hra_layer.py (completed) kind: search input: {"call_id":"call_LCfRNo5JdxbNAhjTLXS0Rrzx","process_id":"59701","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","rg -n "class Linear|hra|HRA|apply_GS|hra_u|lora|forward|Householder|re... output: docs/refs/peft_hra_layer.py:25:from .config import HRAConfig docs/refs/peft_hra_layer.py:28:class HRALayer(BaseTunerLayer): docs/refs/peft_hra_layer.py:30: adapter_layer_names = ("hra_u",) docs/refs/peft_hra_layer.py:32: other_param_names = ("hra_r", "hra_apply_GS") docs/refs/peft_hra_layer.py:34: def init(self, base_layer: nn.Module, **kwargs) -> None: docs/refs/peft_hra_layer.py:36: self.hra_r = {} docs/refs/peft_hra_layer.py:37: self.hra_apply_GS = {} docs/refs/peft_hra_layer.py:38: self.hra_u = nn.ParameterDict({}) docs/refs/peft_hra_layer.py:42: # flag to enable/disable casting of input to weight dtype during forward call docs/refs/peft_hra_layer.py:58: config: HRAConfig, docs/refs/peft_hra_layer.py:61: """Internal function to create hra adapter docs/refs/peft_hra_layer.py:66: config (HRAConfig): The adapter configuration for this layer. docs/refs/peft_hra_layer.py:68: apply_GS = config.apply_GS docs/refs/peft_hra_layer.py:69: init_weights = config.init_weights docs/refs/peft_hra_layer.py:75: self.hra_r[adapter_name] = r docs/refs/peft_hra_layer.py:76: self.hra_apply_GS[adapter_name] = apply_GS docs/refs/peft_hra_layer.py:78: # Determine shape of HRA weights docs/refs/peft_hra_layer.py:81: self.hra_u[adapter_name] = nn.Parameter(torch.empty(self.in_features, r), requires_grad=True) docs/refs/peft_hra_layer.py:83: self.hra_u[adapter_name] = nn.Parameter( docs/refs/peft_hra_layer.py:88: raise TypeError(f"HRA is not implemented for base layers of type {type(base_layer).name}") docs/refs/peft_hra_layer.py:91: if init_weights: docs/refs/peft_hra_layer.py:92: self.reset_hra_parameters(adapter_name) docs/refs/peft_hra_layer.py:94: self.reset_hra_parameters_random(adapter_name) docs/refs/peft_hra_layer.py💯 def reset_hra_parameters(self, adapter_name: str): docs/refs/peft_hra_layer.py:101: ... I’ve found one important likely new issue already: upstream PEFT DeLoRA scales by a per-input-channel weight norm, not a single Frobenius scalar. I’m going to verify that against the original author fork before calling it a new bug.

[tool] Read peft_lora_layer.py (running) input: {"call_id":"call_2Y4OTSQuzK2OzQuxYVZAKes5","process_id":"38333","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_lora_layer.py | sed -n '190,245p'"],"cwd":"/media... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_lora_layer.py

[tool] Read peft_lora_layer.py (running) input: {"call_id":"call_opCVMbkydjfHxo72RkbW8Gpz","process_id":"20581","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_lora_layer.py | sed -n '360,395p'"],"cwd":"/media... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_lora_layer.py

[tool] Read orig_pissa_init.py (running) input: {"call_id":"call_25wGw89PzHeRkgbVkFe0nEgN","process_id":"21854","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/orig_pissa_init.py | sed -n '35,65p'"],"cwd":"/media/w... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/orig_pissa_init.py

[tool] Read peft_lora_layer.py (running) input: {"call_id":"call_Cujjy1Wat05YCOIbK5onwiPA","process_id":"16041","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_lora_layer.py | sed -n '941,976p'"],"cwd":"/media... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_lora_layer.py

[tool] Read peft_lora_dora.py (running) input: {"call_id":"call_5vteXTncTSo1hXvTGWuBoAB6","process_id":"13301","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_lora_dora.py | sed -n '86,166p'"],"cwd":"/media/w... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_lora_dora.py

[tool] Read peft_lora_layer.py (completed) kind: read input: {"call_id":"call_2Y4OTSQuzK2OzQuxYVZAKes5","process_id":"38333","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_lora_layer.py | sed -n '190,245p'"],"cwd":"/media... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_lora_layer.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 190 lora_dropout_layer = nn.Dropout(p=lora_dropout) 191 else: 192 lora_dropout_layer = nn.Identity() 193 194 self.lora_dropout.update(nn.ModuleDict({adapter_name: lora_dropout_layer})) 195 196 # Actual trainable parameters 197 self.lora_A[adapter_name] = nn.Linear(self.in_features, r, bias=False) 198 self.lora_B[adapter_name] = nn.Linear(r, self.out_features, bias=lora_bias) 199 200 # Tying adapters is only implemented for Linear layers 201 # where the source is the embedding layer. 202 # Currently, this is the most prevelant way of tying layers (weight tying) 203 if tied_adapter: 204 lora_A_params = tied_adapter["lora_A"] 205 lora_B_params = tied_adapter["lora_B"] 206 207 self.lora_A[adapter_name].weight = torch.nn.Parameter(lora_A_params) 208 self.lora_B[adapter_name].weight = torch.nn.Parameter(lora_B_params) 209 210 self.lora_bias[adapter_name] = lora_bias 211 212 if use_rslora: 213 self.scaling[adapter_name] = lora_alpha / math.sqrt(r) 214 else: 215 self.scaling[adapter_name] = lora_alpha / r 216 ... (29 more lines)

[tool] Read peft_lora_layer.py (completed) kind: read input: {"call_id":"call_opCVMbkydjfHxo72RkbW8Gpz","process_id":"20581","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_lora_layer.py | sed -n '360,395p'"],"cwd":"/media... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_lora_layer.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 360 def pissa_init(self, adapter_name, init_lora_weights): 361 weight = self.get_base_layer().weight 362 dtype = weight.dtype 363 if dtype not in [torch.float32, torch.float16, torch.bfloat16]: 364 raise TypeError( 365 "Please initialize PiSSA under float32, float16, or bfloat16. " 366 "Subsequently, re-quantize the residual model to help minimize quantization errors." 367 ) 368 weight = transpose(weight.to(torch.float32), self.fan_in_fan_out) 369 if init_lora_weights == "pissa": 370 # USV^T = W <-> VSU^T = W^T, where W^T = weight.data in R^{out_channel, in_channel}, 371 V, S, Uh = torch.linalg.svd(weight.data, full_matrices=False) 372 Vr = V[:, : self.r[adapter_name]] 373 Sr = S[: self.r[adapter_name]] 374 Sr /= self.scaling[adapter_name] 375 Uhr = Uh[: self.r[adapter_name]] 376 elif len(init_lora_weights.split("niter")) == 2: 377 Vr, Sr, Ur = svd_lowrank( 378 weight.data, self.r[adapter_name], niter=int(init_lora_weights.split("niter")[-1]) 379 ) 380 Sr /= self.scaling[adapter_name] 381 Uhr = Ur.t() 382 else: 383 raise ValueError( 384 f"init_lora_weights should be 'pissa' or 'pissa_niter_[number of iters]', got {init_lora_weights} instead." 385 ) 386 ... (9 more lines)

[tool] Read peft_ia3_layer.py (running) input: {"call_id":"call_6x5M2DzKlZugvBcPBhDgk5lo","process_id":"52042","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_ia3_layer.py | sed -n '56,80p'; nl -ba docs/refs/... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_ia3_layer.py

[tool] Read orig_pissa_init.py (completed) kind: read input: {"call_id":"call_25wGw89PzHeRkgbVkFe0nEgN","process_id":"21854","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/orig_pissa_init.py | sed -n '35,65p'"],"cwd":"/media/w... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/orig_pissa_init.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 35 torch_dtype=( 36 torch.float16 37 if script_args.bits == "fp16" 38 else (torch.bfloat16 if script_args.bits == "bf16" else torch.float32) 39 ), 40 device_map="auto", 41 ) 42 tokenizer = AutoTokenizer.from_pretrained(script_args.base_model_path) 43 tokenizer.pad_token_id = tokenizer.eos_token_id 44 lora_config = LoraConfig( 45 r=script_args.lora_r, 46 lora_alpha=script_args.lora_alpha, 47 init_lora_weights=True if script_args.init_weights=="True" else script_args.init_weights, 48 lora_dropout=script_args.lora_dropout, 49 target_modules=script_args.target_modules, 50 ) 51 peft_model = get_peft_model(model, lora_config) 52 53 # Save PiSSA modules: 54 peft_model.peft_config["default"].init_lora_weights = True 55 peft_model.save_pretrained(os.path.join(script_args.output_dir, "pissa_init")) 56 # Save residual model: 57 peft_model = peft_model.unload() 58 peft_model.save_pretrained(script_args.output_dir) 59 # Save the tokenizer: 60 tokenizer.save_pretrained(script_args.output_dir)

[tool] Read peft_lora_layer.py (completed) kind: read input: {"call_id":"call_Cujjy1Wat05YCOIbK5onwiPA","process_id":"16041","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_lora_layer.py | sed -n '941,976p'"],"cwd":"/media... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_lora_layer.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 941 def forward(self, x: torch.Tensor, *args: Any, **kwargs: Any) -> torch.Tensor: 942 self._check_forward_args(x, *args, **kwargs) 943 adapter_names = kwargs.pop("adapter_names", None) 944 variant_kwargs = {k: kwargs.pop(k, None) for k in VARIANT_KWARG_KEYS} # don't pass these to base_layer 945 946 if self.disable_adapters: 947 if self.merged: 948 self.unmerge() 949 result = self.base_layer(x, *args, **kwargs) 950 elif adapter_names is not None: 951 result = self._mixed_batch_forward(x, *args, adapter_names=adapter_names, **variant_kwargs, **kwargs) 952 elif self.merged: 953 result = self.base_layer(x, *args, **kwargs) 954 else: 955 result = self.base_layer(x, *args, **kwargs) 956 torch_result_dtype = result.dtype 957 958 lora_A_keys = self.lora_A.keys() 959 for active_adapter in self.active_adapters: 960 if active_adapter not in lora_A_keys: 961 continue 962 963 lora_A = self.lora_A[active_adapter] 964 lora_B = self.lora_B[active_adapter] 965 dropout = self.lora_dropout[active_adapter] 966 scaling = self.scaling[active_adapter] 967 x = self._cast_input_dtype(x, lora_A.weight.dtype) ... (9 more lines)

[tool] Read peft_lora_dora.py (completed) kind: read input: {"call_id":"call_5vteXTncTSo1hXvTGWuBoAB6","process_id":"13301","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_lora_dora.py | sed -n '86,166p'"],"cwd":"/media/w... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_lora_dora.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 86 def get_weight_norm(self, weight, lora_weight, scaling, adapter_name: Optional[str] = None) -> torch.Tensor: 87 # calculate L2 norm of weight matrix, column-wise 88 weight = transpose(weight, self.fan_in_fan_out) 89 weight = weight + scaling * lora_weight 90 weight_norm = torch.linalg.norm(weight, dim=1).to(weight.dtype) 91 return weight_norm 92 93 @cache_decorator("lora-weight") 94 def get_lora_weight(self, lora_A, lora_B, adapter_name: Optional[str] = None): 95 # Don't use lora_weight = lora_B.weight @ lora_A.weight because this causes errors with FSDP. Instead, 96 # calculate the same but using forward. 97 x_eye = torch.eye(lora_A.weight.shape[1], device=lora_A.weight.device, dtype=lora_A.weight.dtype) 98 lora_weight = lora_B(lora_A(x_eye)).T 99 return lora_weight 100 101 def update_layer(self, *, base_layer, lora_A, lora_B, scaling, place_on_cpu=False) -> None: 102 # temporarily convert fp16 to fp32, as fp16 can cause trouble on CPU with PyTorch < 2.2 103 dtype_is_fp16 = lora_A.dtype == torch.float16 104 if dtype_is_fp16: 105 lora_A = lora_A.float() 106 lora_B = lora_B.float() 107 108 with gather_params_ctx(base_layer.parameters()): 109 if base_layer.class.name == "Linear4bit": 110 # We have to create a copy of the base layer, otherwise, FSDP will throw an error. 8bit does not work 111 # yet because Int8Params cannot be correctly deep-copied (attributes vanish) 112 base_layer = deepcopy(base_layer) ... (54 more lines)

[tool] Read peft_ia3_layer.py (completed) kind: read input: {"call_id":"call_6x5M2DzKlZugvBcPBhDgk5lo","process_id":"52042","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_ia3_layer.py | sed -n '56,80p'; nl -ba docs/refs/... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_ia3_layer.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 56 def update_layer(self, adapter_name: str, config: IA3Config, **kwargs): 57 init_ia3_weights = config.init_ia3_weights 58 inference_mode = config.inference_mode 59 60 # This code works for linear layers, override for other layer types 61 # Actual trainable parameters 62 if self.is_feedforward: 63 weight = torch.randn((1, self.in_features)) 64 else: 65 weight = torch.randn((self.out_features, 1)) 66 self.ia3_l[adapter_name] = nn.Parameter(weight) 67 if init_ia3_weights: 68 self.reset_ia3_parameters(adapter_name) 69 self.move_adapter_to_device_of_base_layer(adapter_name) 70 self.set_adapter(self.active_adapters, inference_mode=inference_mode) 71 72 def reset_ia3_parameters(self, adapter_name): 73 if adapter_name in self.ia3_l.keys(): 74 # initialize learned vector with torch.ones 75 nn.init.constant(self.ia3_l[adapter_name], 1.0) 76 77 78 class Linear(nn.Module, IA3Layer): 79 # (IA)^3 implemented in a dense layer 80 def init( sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 161 def forward(self, x: torch.Tensor, *args: Any, **kwargs: Any) -> torch.Tensor: ... (25 more lines)

[tool] Read peft_hra_layer.py (running) input: {"call_id":"call_tdrRy87nG7HWzt0pyxWOMB5h","process_id":"62390","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_hra_layer.py | sed -n '209,264p'"],"cwd":"/media/... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_hra_layer.py

[tool] Read peft_hra_layer.py (running) input: {"call_id":"call_wcSspixdogAwumrsFaLNIN6i","process_id":"94071","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_hra_layer.py | sed -n '78,110p'"],"cwd":"/media/w... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_hra_layer.py

[tool] Read peft_delora_layer.py (running) input: {"call_id":"call_BiuiWIYCjjP1C2MSuBRvrabj","process_id":"14337","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_delora_layer.py | sed -n '112,155p'; nl -ba docs/... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_delora_layer.py

[tool] Read orig_hra_layer.py (running) input: {"call_id":"call_K12I8uQGGqEcuqDJ1SNCrsqE","process_id":"8173","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/orig_hra_layer.py | sed -n '52,62p'; nl -ba docs/refs/o... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/orig_hra_layer.py

[tool] Read orig_delora.py (running) input: {"call_id":"call_0C71TzU8xeBpNWvOwibnhBeX","process_id":"92303","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/orig_delora.py | sed -n '302,353p'; nl -ba docs/refs/o... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/orig_delora.py

[tool] Read peft_delora_layer.py (running) input: {"call_id":"call_WUHTervAVvc9qfAXHNlpoc86","process_id":"57442","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_delora_layer.py | sed -n '28,75p'"],"cwd":"/media... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_delora_layer.py

[tool] Read peft_hra_layer.py (completed) kind: read input: {"call_id":"call_wcSspixdogAwumrsFaLNIN6i","process_id":"94071","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_hra_layer.py | sed -n '78,110p'"],"cwd":"/media/w... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_hra_layer.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 78 # Determine shape of HRA weights 79 base_layer = self.get_base_layer() 80 if isinstance(base_layer, nn.Linear): 81 self.hra_u[adapter_name] = nn.Parameter(torch.empty(self.in_features, r), requires_grad=True) 82 elif isinstance(base_layer, nn.Conv2d): 83 self.hra_u[adapter_name] = nn.Parameter( 84 torch.empty(self.in_features * base_layer.kernel_size[0] * base_layer.kernel_size[0], r), 85 requires_grad=True, 86 ) 87 else: 88 raise TypeError(f"HRA is not implemented for base layers of type {type(base_layer).name}") 89 90 # Initialize weights 91 if init_weights: 92 self.reset_hra_parameters(adapter_name) 93 else: 94 self.reset_hra_parameters_random(adapter_name) 95 96 # Move new weights to device 97 self.move_adapter_to_device_of_base_layer(adapter_name) 98 self.set_adapter(self.active_adapters, inference_mode=inference_mode) 99 100 def reset_hra_parameters(self, adapter_name: str): 101 if self.hra_r[adapter_name] % 2 != 0: 102 warnings.warn("The symmetric initialization can NOT be performed when r is odd!") 103 nn.init.kaiming_uniform(self.hra_u[adapter_name], a=math.sqrt(5)) 104 else: ... (6 more lines)

[tool] Read peft_hra_layer.py (completed) kind: read input: {"call_id":"call_tdrRy87nG7HWzt0pyxWOMB5h","process_id":"62390","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_hra_layer.py | sed -n '209,264p'"],"cwd":"/media/... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_hra_layer.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 209 rank = self.hra_r[adapter_name] 210 apply_GS = self.hra_apply_GS[adapter_name] 211 opt_u = self.hra_u[adapter_name] 212 shape = opt_u.shape 213 214 if apply_GS: 215 weight = [(opt_u[:, 0] / opt_u[:, 0].norm()).view(-1, 1)] 216 for i in range(1, rank): 217 ui = opt_u[:, i].view(-1, 1) 218 for j in range(i): 219 ui = ui - (weight[j].t() @ ui) * weight[j] 220 weight.append((ui / ui.norm()).view(-1, 1)) 221 weight = torch.cat(weight, dim=1) 222 weight = torch.eye(shape[0], device=opt_u.device, dtype=opt_u.dtype) - 2 * weight @ weight.t() 223 224 else: 225 opt_u = opt_u / opt_u.norm(dim=0) 226 weight = torch.eye(shape[0], device=opt_u.device, dtype=opt_u.dtype) 227 if reverse: 228 indices = range(rank - 1, -1, -1) 229 else: 230 indices = range(rank) 231 232 for i in indices: 233 ui = opt_u[:, i].view(-1, 1) 234 weight = weight - 2 * weight @ ui @ ui.t() 235 ... (29 more lines)

[tool] Read peft_delora_layer.py (completed) kind: read input: {"call_id":"call_BiuiWIYCjjP1C2MSuBRvrabj","process_id":"14337","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_delora_layer.py | sed -n '112,155p'; nl -ba docs/... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_delora_layer.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 112 self.delora_A[adapter_name] = nn.Parameter(torch.empty(r, self.in_features)) 113 self.delora_B[adapter_name] = nn.Parameter(torch.empty(self.out_features, r)) 114 self.delora_lambda[adapter_name] = nn.Parameter(torch.empty(1)) 115 if module_dropout > 0.0: 116 module_dropout_layer = nn.Dropout(p=module_dropout) 117 else: 118 module_dropout_layer = nn.Identity() 119 self.delora_dropout.update(nn.ModuleDict({adapter_name: module_dropout_layer})) 120 121 # Initialize weights 122 self.reset_delora_parameters(adapter_name, init_weights, delora_lambda) 123 124 # Move new weights to device 125 self.move_adapter_to_device_of_base_layer(adapter_name) 126 self.set_adapter(self.active_adapters, inference_mode=inference_mode) 127 128 def reset_delora_parameters( 129 self, 130 adapter_name: str, 131 init_weights: bool = True, 132 delora_lambda: float = 15.0, 133 ) -> None: 134 if adapter_name not in self.delora_A.keys(): 135 return 136 137 if init_weights is True: 138 nn.init.kaiming_uniform(self.delora_A[adapter_name], a=math.sqrt(5)) ... (57 more lines)

[tool] Read peft_delora_layer.py (completed) kind: read input: {"call_id":"call_WUHTervAVvc9qfAXHNlpoc86","process_id":"57442","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/peft_delora_layer.py | sed -n '28,75p'"],"cwd":"/media... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/peft_delora_layer.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 28 29 class DeloraLayer(BaseTunerLayer): 30 # All names of layers that may contain (trainable) adapter weights 31 adapter_layer_names = ( 32 "delora_A", 33 "delora_B", 34 "delora_lambda", 35 ) 36 # All names of other parameters that may contain adapter-related parameters 37 other_param_names = ( 38 "r", 39 "delora_dropout", 40 "delora_w_norm", 41 ) 42 43 def init(self, base_layer: nn.Module, **kwargs) -> None: 44 self.base_layer = base_layer 45 self.r = {} 46 self.delora_dropout = nn.ModuleDict({}) 47 self.delora_A = nn.ParameterDict({}) 48 self.delora_B = nn.ParameterDict({}) 49 self.delora_lambda = nn.ParameterDict({}) 50 # Use persistent buffers so they are included in state_dict and saved. 51 self.delora_w_norm = BufferDict({}, persistent=True) 52 # Mark the weight as unmerged 53 self._disable_adapters = False 54 self.merged_adapters = [] ... (21 more lines)

[tool] Read orig_delora.py (completed) kind: read input: {"call_id":"call_0C71TzU8xeBpNWvOwibnhBeX","process_id":"92303","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/orig_delora.py | sed -n '302,353p'; nl -ba docs/refs/o... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/orig_delora.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 302 self.delora_A = nn.Linear(in_features, r, bias=False) 303 self.delora_B = nn.Linear(r, out_features, bias=False) 304 305 self.delora_lambda = nn.Parameter(torch.full((1,), delora_lambda), requires_grad=True) 306 307 # Frozen parameters 308 self.frozen_C = nn.Parameter(torch.empty_like(self.delora_A.weight).copy_(self.delora_A.weight)) 309 self.frozen_C.requires_grad = False 310 self.frozen_D = nn.Parameter(torch.empty_like(self.delora_B.weight).copy_(self.delora_B.weight)) 311 self.frozen_D.requires_grad = False 312 313 # Freezing the pre-trained weight matrix 314 self.weight.requires_grad = False 315 316 self.reset_parameters() 317 if fan_in_fan_out: 318 self.weight.data = self.weight.data.T 319 320 def reset_parameters(self): 321 nn.Linear.reset_parameters(self) 322 if hasattr(self, "delora_A"): 323 # initialize A the same way as the default for nn.Linear and B to zero 324 nn.init.kaiming_uniform_(self.delora_A.weight, a=math.sqrt(5)) 325 nn.init.kaiming_uniform_(self.delora_B.weight, a=math.sqrt(5)) 326 nn.init.constant_(self.delora_lambda, self.delora_lambda_value) 327 self.frozen_C.data = self.delora_A.weight.data 328 self.frozen_D.data = self.delora_B.weight.data ... (39 more lines)

[tool] Read orig_hra_layer.py (completed) kind: read input: {"call_id":"call_K12I8uQGGqEcuqDJ1SNCrsqE","process_id":"8173","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/orig_hra_layer.py | sed -n '52,62p'; nl -ba docs/refs/o... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/orig_hra_layer.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 52 self.oft_r[adapter_name] = nn.Parameter( 53 torch.cat([torch.eye(r, device=weight.device, dtype=weight.dtype), 54 torch.zeros(shape[0] - r, r, device=weight.device, dtype=weight.dtype)], dim=0)) 55 56 def reset_adapter_parameters(self, adapter_name: str): 57 # nn.init.zeros_(self.oft_r[adapter_name]) 58 nn.init.kaiming_uniform_(self.oft_r[adapter_name], a=1 / self.eps[adapter_name]) 59 60 def reset_adapter_parameters_random(self, adapter_name: str): 61 # nn.init.kaiming_uniform_(self.oft_r[adapter_name], a=math.sqrt(5)) 62 nn.init.kaiming_uniform_(self.oft_r[adapter_name], a=1 / self.eps[adapter_name]) sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 300 # scaling factor for each of the smaller block matrix 301 eps = eps * 1 / torch.sqrt(torch.tensor(oft_r.shape[0])) 302 I = ( # noqa: E741 303 torch.zeros((oft_r.size(1), oft_r.size(1)), device=oft_r.device, dtype=oft_r.dtype) 304 .unsqueeze(0) 305 .expand_as(oft_r) 306 ) 307 diff = oft_r - I 308 norm_diff = torch.norm(oft_r - I, dim=(1, 2), keepdim=True) 309 mask = (norm_diff <= eps).bool() 310 out = torch.where(mask, oft_r, I + eps * (diff / norm_diff)) 311 return out 312 313 def forward(self, x: torch.Tensor, *args, **kwargs) -> torch.Tensor: 314 previous_dtype = x.dtype ... (18 more lines)

[tool] Read orig_delora.py (running) input: {"call_id":"call_ah4F1fipTlU1wMmtswZGndZ5","process_id":"58167","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/orig_delora.py | sed -n '348,358p'"],"cwd":"/media/was... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/orig_delora.py

[tool] Read orig_delora.py (completed) kind: read input: {"call_id":"call_ah4F1fipTlU1wMmtswZGndZ5","process_id":"58167","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba docs/refs/orig_delora.py | sed -n '348,358p'"],"cwd":"/media/was... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/docs/refs/orig_delora.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 348 # Get ABCD 349 ABCD = delora_B_weight @ diag12 @ delora_A_weight 350 ABCD = ABCD - self.frozen_D @ diag34 @ self.frozen_C 351 352 # W scaling 353 Wnorm = self.weight.data.norm(dim=0) # shape: (in_features,) 354 ABCD = torch.mul(ABCD, Wnorm.unsqueeze(0)) # shape: (out_features, in_features) 355 356 return ABCD 357 358 def train(self, mode: bool = True):

[tool] Read qwen_probe.log (running) input: {"call_id":"call_QNV0Ba2iReMmAW5AUbuGc72t","process_id":"70431","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba logs/qwen_probe.log | sed -n '84,110p'"],"cwd":"/media/wassname/... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/logs/qwen_probe.log

[tool] Read smoke.log (running) input: {"call_id":"call_G6zvop5SgFkVybpFkSeaqW35","process_id":"82319","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba logs/smoke.log | sed -n '114,176p'"],"cwd":"/media/wassname/SGIr... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/logs/smoke.log

[tool] Search q_proj|v_proj|k_proj|lambda0|variant_kwargs|target_names|variant in scripts (running) input: {"call_id":"call_QzqQbhG7VjJ9UPWbbuCm3as7","process_id":"73468","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","rg -n "q_proj|v_proj|k_proj|lambda0|variant_kwargs|target_names|varian...

[tool] Read qwen_probe.log (completed) kind: read input: {"call_id":"call_QNV0Ba2iReMmAW5AUbuGc72t","process_id":"70431","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba logs/qwen_probe.log | sed -n '84,110p'"],"cwd":"/media/wassname/... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/logs/qwen_probe.log output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed)

[tool] Read smoke.log (completed) kind: read input: {"call_id":"call_G6zvop5SgFkVybpFkSeaqW35","process_id":"82319","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba logs/smoke.log | sed -n '114,176p'"],"cwd":"/media/wassname/SGIr... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/logs/smoke.log output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed)

[tool] Search q_proj|v_proj|k_proj|lambda0|variant_kwargs|target_names|variant in scripts (completed) kind: search input: {"call_id":"call_QzqQbhG7VjJ9UPWbbuCm3as7","process_id":"73468","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","rg -n "q_proj|v_proj|k_proj|lambda0|variant_kwargs|target_names|varian... output: scripts/qwen_train_probe.py:17: "model.layers.0.self_attn.q_proj", scripts/qwen_train_probe.py:18: "model.layers.0.self_attn.v_proj", scripts/qwen_train_probe.py:22:def cfg_for_variant(variant: str, dtype: torch.dtype, r: int, alpha: float) -> ll.LoraLiteConfig: scripts/qwen_train_probe.py:24: variant=variant, scripts/qwen_train_probe.py:26: alpha=r if variant == "pissa" else alpha, scripts/qwen_train_probe.py:29: target_names=(r"model.layers.0.self_attn.(q_proj|v_proj)$",), scripts/qwen_train_probe.py:31: variant_kwargs={"lambda0": 0.1} if variant == "delora" else {}, scripts/qwen_train_probe.py:80:def run_variant(args, variant: str, input_ids: torch.Tensor, labels: torch.Tensor, dtype: torch.dtype): scripts/qwen_train_probe.py:83: cfg = cfg_for_variant(variant, dtype, args.r, args.alpha) scripts/qwen_train_probe.py:138: out_path = args.out_dir / f"{variant}adapter.pt" scripts/qwen_train_probe.py:163: "variant": variant, scripts/qwen_train_probe.py:181: parser.add_argument("--variants", nargs="+", default=["lora", "pissa", "delora", "ia3", "dora", "hra"]) scripts/qwen_train_probe.py:201: print("SHOULD: exact q_proj/v_proj layer-0 targets, lora-only grads, lossN<loss0, perturb>0, reload<tol. ELSE hook/target/train/save bug.") scripts/qwen_train_probe.py:202: rows = [run_variant(args, variant, input_ids, labels, dtype) for variant in args.variants] docs/developer_guide.md:3:This is the implementation note for people adding adapter variants. The README is only for prospective users. docs/developer_guide.md:15:A variant is a registered class with a small static interface: docs/developer_guide.md:20: name = "myvariant" docs/developer_guide.md:42: layer.lora* ← variant.param_specs(layer, cfg) docs/developer_guide.md:43: variant.init(layer, cfg) docs/developer_guide.md:44: hook(layer, lambda x, y: variant.forward(layer, x, y)) docs/developer_guide.md:60:Activation-aware variants imple...

[tool] Read qwen_probe.log (running) input: {"call_id":"call_OhxCJuNFEPRmzaaFxGHbUu4Z","process_id":"42843","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba logs/qwen_probe.log | sed -n '1,40p'"],"cwd":"/media/wassname/SG... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/logs/qwen_probe.log

[tool] Read smoke.log (running) input: {"call_id":"call_4LDa92YA8bBz3kwUeLYLIBG0","process_id":"16740","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba logs/smoke.log | sed -n '1,90p'"],"cwd":"/media/wassname/SGIronW... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/logs/smoke.log

[tool] Read qwen_train_probe.py (running) input: {"call_id":"call_yK1qJHALfCEDcJmKM8bJi3lw","process_id":"2496","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba scripts/qwen_train_probe.py | sed -n '15,35p'"],"cwd":"/media/was... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/scripts/qwen_train_probe.py

[tool] Read qwen_probe.log (completed) kind: read input: {"call_id":"call_OhxCJuNFEPRmzaaFxGHbUu4Z","process_id":"42843","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba logs/qwen_probe.log | sed -n '1,40p'"],"cwd":"/media/wassname/SG... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/logs/qwen_probe.log output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 1 New task added (id 84). 2 SHOULD: exact q_proj/v_proj layer-0 targets, lora-only grads, lossN<loss0, perturb>0, reload<tol. ELSE hook/target/train/save bug. 3 Loading weights: 0%| | 0/311 [00:00<?, ?it/s] Loading weights: 100%|██████████| 311/311 [00:00<00:00, 11183.76it/s] 4 Loading weights: 0%| | 0/311 [00:00<?, ?it/s] Loading weights: 100%|██████████| 311/311 [00:00<00:00, 10483.91it/s] 5 Loading weights: 0%| | 0/311 [00:00<?, ?it/s] Loading weights: 100%|██████████| 311/311 [00:00<00:00, 11231.62it/s] 6 Loading weights: 0%| | 0/311 [00:00<?, ?it/s] Loading weights: 100%|██████████| 311/311 [00:00<00:00, 11419.02it/s] 7 Loading weights: 0%| | 0/311 [00:00<?, ?it/s] Loading weights: 100%|██████████| 311/311 [00:00<00:00, 11501.48it/s] 8 Loading weights: 0%| | 0/311 [00:00<?, ?it/s] Loading weights: 100%|██████████| 311/311 [00:00<00:00, 10138.41it/s] 9 Loading weights: 0%| | 0/311 [00:00<?, ?it/s] Loading weights: 100%|██████████| 311/311 [00:00<00:00, 11177.81it/s] 10 Loading weights: 0%| | 0/311 [00:00<?, ?it/s] Loading weights: 100%|██████████| 311/311 [00:00<00:00, 10532.84it/s] 11 Loading weights: 0%| | 0/311 [00:00<?, ?it/s] Loading weights: 100%|██████████| 311/311 [00:00<00:00, 10290.29it/s] 12 Loading weights: 0%| | 0/311 [00:00<?, ?it/s] Loading weights: 100%|██████████| 311/311 [00:00<00:00, 11146.68it/s] 13 Loading weights: 0%| | 0/311 [00:00<?, ?it/s] Loading weights: 100%|██████████| 311/311 [00:00<00:00, 10274.97it/s] 14 Loading weights: 0%| | 0/311 [00:00<?, ?it/s] Loading weights: 100%|██████████| 311/311 [00:00<00:00, 10701.60it/s] 15 variant targets trainable id_err perturb loss0 lossN drop% grad...

[tool] Read smoke.log (completed) kind: read input: {"call_id":"call_4LDa92YA8bBz3kwUeLYLIBG0","process_id":"16740","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba logs/smoke.log | sed -n '1,90p'"],"cwd":"/media/wassname/SGIronW... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/logs/smoke.log output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 1 /media/wassname/SGIronWolf/projects5/2026/lora-lite/.venv/lib/python3.13/site-packages/bitsandbytes/autograd/_functions.py:123: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization 2 warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization") 3 /media/wassname/SGIronWolf/projects5/2026/lora-lite/.venv/lib/python3.13/site-packages/bitsandbytes/backends/cuda/ops.py:213: FutureWarning: _check_is_size will be removed in a future PyTorch release along with guard_size_oblivious. Use _check(i >= 0) instead. 4 torch._check_is_size(blocksize) 5 6 === variant=lora dtype=torch.float32 === 7 attached 28 targets, trainable params=17408 8 t=0 identity: max|y_adapt - y_base| = 0.000e+00 (base scale 4.609e+00) 9 SHOULD: err<4.6e-06. PASS. 10 save/load: max|y_loaded - y_adapt| = 0.000e+00 11 SHOULD: err2<4.6e-06. PASS. 12 loss[0]=1.0882 loss[-1]=1.0220 drop=6.1% 13 SHOULD: drop>5%. PASS. 14 15 === variant=pissa dtype=torch.float32 === 16 attached 28 targets, trainable params=17408 17 t=0 identity: max|y_adapt - y_base| = 1.550e-06 (base scale 4.609e+00) 18 SHOULD: err<2.3e-03. PASS. 19 save/load: max|y_loaded - y_adapt| = 0.000e+00 20 SHOULD: err2<2.3e-03. PASS. 21 loss[0]=1.0823 loss[-1]=0.9583 drop=11.5% 22 SHOULD: drop>5%. PASS. 23 24 === variant=delora dtype=torch.float32 === 25 attached 28 targets, trainable params=17436 26 t=0 identity: max|y_adapt - y_base| = 0.000e+00 (base scale 4.609e+00) 27 SHOULD: err<4.6e-06. PASS. ... (54 more lines)

[tool] Read qwen_train_probe.py (completed) kind: read input: {"call_id":"call_yK1qJHALfCEDcJmKM8bJi3lw","process_id":"2496","turn_id":"019dc987-7ec1-7ee2-8133-220e153d9394","command":["/usr/bin/zsh","-lc","nl -ba scripts/qwen_train_probe.py | sed -n '15,35p'"],"cwd":"/media/was... files: /media/wassname/SGIronWolf/projects5/2026/lora-lite/scripts/qwen_train_probe.py output: sed: /tmp/.mount_vscodiOHg5sd/lib/x86_64-linux-gnu/libselinux.so.1: no version information available (required by sed) 15 PROMPT = "LoRA-lite probe: Paris is the capital of France. The answer is" 16 EXPECTED_TARGETS = { 17 "model.layers.0.self_attn.q_proj", 18 "model.layers.0.self_attn.v_proj", 19 } 20 21 22 def cfg_for_variant(variant: str, dtype: torch.dtype, r: int, alpha: float) -> ll.LoraLiteConfig: 23 return ll.LoraLiteConfig( 24 variant=variant, 25 r=r, 26 alpha=r if variant == "pissa" else alpha, 27 dtype=dtype, 28 target_roles=(), 29 target_names=(r"model.layers.0.self_attn.(q_proj|v_proj)$",), 30 layers=(0,), 31 variant_kwargs={"lambda0": 0.1} if variant == "delora" else {}, 32 ) 33 34 35 def adapter_state(model: torch.nn.Module) -> dict[str, torch.Tensor]:

Per-Variant Paper-Faithfulness Audit V2

LoRA

Reference impl ground-truth

PEFT creates A: in -> r, B: r -> out, scales by alpha/r, initializes A Kaiming and B zero, then adds B(A(dropout(x))) * scaling to the base output:

docs/refs/peft_lora_layer.py:197 self.lora_A[adapter_name] = nn.Linear(self.in_features, r, bias=False)
docs/refs/peft_lora_layer.py:198 self.lora_B[adapter_name] = nn.Linear(r, self.out_features, bias=lora_bias)
docs/refs/peft_lora_layer.py:215 self.scaling[adapter_name] = lora_alpha / r
docs/refs/peft_lora_layer.py:269 nn.init.kaiming_uniform_(self.lora_A[adapter_name].weight, a=math.sqrt(5))
docs/refs/peft_lora_layer.py:274 nn.init.zeros_(self.lora_B[adapter_name].weight)
docs/refs/peft_lora_layer.py:969 result = result + lora_B(lora_A(dropout(x))) * scaling

Save/load ground-truth: adapter params are named adapter layers in PEFT (lora_A, lora_B) and are state-dict parameters; no base-weight mutation for vanilla LoRA.

Our code

src/lora_lite/variants/lora.py:26 "lora_A": ParamSpec((cfg.r, d_in), init="kaiming", trainable=True),
src/lora_lite/variants/lora.py:27 "lora_B": ParamSpec((d_out, cfg.r), init="zeros", trainable=True),
src/lora_lite/variants/lora.py:38 scale = cfg.alpha / cfg.r
src/lora_lite/variants/lora.py:39 h = einsum(x, layer.lora_A, "... i, r i -> ... r")
src/lora_lite/variants/lora.py:40 delta = einsum(h, layer.lora_B, "... r, o r -> ... o")
src/lora_lite/variants/lora.py:41 return y + scale * delta

Diff

[OK-doc] Parameter shapes, init, scale, and additive forward match PEFT for dropout=0.
[NEW-BUG] cfg.dropout is exposed but ignored: PEFT applies dropout(x) before A (peft_lora_layer.py:969), while our forward always uses raw x. Failure mode: setting LoraLiteConfig(dropout>0) silently has no effect. This is partly admitted in src/lora_lite/config.py:11, but not in the LoRA docstring.
[STYLE] PEFT supports bias, multiple adapters, fan-in/fan-out, merge/unmerge; lora-lite intentionally omits those.

Did the prior review get it right?

Prior verdict: docs/audit/variants_review.md:1175 says “Faithful. Kaiming is not literally ‘Gaussian’... and the scaling matches.” Confirm for default dropout=0, but prior missed the exposed-dropout no-op.

Verdict: Faithful-with-doc-gap

Faithful to PEFT LoRA at default settings; doc/config should say dropout is unsupported or implement it.

PiSSA

Reference impl ground-truth

PEFT PiSSA SVDs the base weight in fp32, divides singular values by scaling, installs A,B, and mutates the base weight to the residual:

docs/refs/peft_lora_layer.py:368 weight = transpose(weight.to(torch.float32), self.fan_in_fan_out)
docs/refs/peft_lora_layer.py:371 V, S, Uh = torch.linalg.svd(weight.data, full_matrices=False)
docs/refs/peft_lora_layer.py:374 Sr /= self.scaling[adapter_name]
docs/refs/peft_lora_layer.py:387 lora_A = torch.diag(torch.sqrt(Sr)) @ Uhr
docs/refs/peft_lora_layer.py:388 lora_B = Vr @ torch.diag(torch.sqrt(Sr))
docs/refs/peft_lora_layer.py:391 weight = weight.data - self.scaling[adapter_name] * lora_B @ lora_A
docs/refs/peft_lora_layer.py:393 self.get_base_layer().weight.data = weight

Original PiSSA save flow explicitly saves both PiSSA modules and the residual model:

docs/refs/orig_pissa_init.py:53 # Save PiSSA modules:
docs/refs/orig_pissa_init.py:55 peft_model.save_pretrained(os.path.join(script_args.output_dir, "pissa_init"))
docs/refs/orig_pissa_init.py:56 # Save residual model:
docs/refs/orig_pissa_init.py:57 peft_model = peft_model.unload()
docs/refs/orig_pissa_init.py:58 peft_model.save_pretrained(script_args.output_dir)

Our code

src/lora_lite/variants/pissa.py:47 U, S, Vh = torch.linalg.svd(W, full_matrices=False)
src/lora_lite/variants/pissa.py:52 B = (Ur * sqrtS).to(cfg.dtype)
src/lora_lite/variants/pissa.py:53 A = (sqrtS[:, None] * Vhr).to(cfg.dtype)
src/lora_lite/variants/pissa.py:54 layer.lora_B.data.copy_(B)
src/lora_lite/variants/pissa.py:55 layer.lora_A.data.copy_(A)
src/lora_lite/variants/pissa.py:60 scale = cfg.alpha / cfg.r
src/lora_lite/variants/pissa.py:61 layer.weight.data.copy_((W - scale * BA).to(layer.weight.dtype))

Adapter save/load only stores lora_ tensors:

src/lora_lite/adapter.py:98 sd = {k: v.detach().cpu() for k, v in model.state_dict().items() if "lora_" in k}
src/lora_lite/adapter.py:105 handles = attach(model, cfg)  # creates empty params with right shapes

Diff

[OK-doc] Same residualization idea as PEFT: base W <- W - scale * BA, forward adds scale * BAx.
[OK-undoc] PEFT divides Sr by scaling before forming A,B; our code does not, but subtracts/adds scale * BA. Both preserve identity when using the same scale. The trained parameterization differs by a scale factor, so this should be noted.
[BUG] PiSSA save/load is not reference-faithful: original PiSSA saves both adapter and mutated residual model (orig_pissa_init.py:53-58), while lora-lite saves only lora_ tensors and recomputes residual on load. Loading onto a base that is not exactly the same silently changes the effective model.
[BUG] bf16/Qwen identity error remains real: logs/qwen_probe.log:17 reports pissa id_err=0.3125 while still ending ALL QWEN PROBES PASS at line 22.
[NEW-BUG] Same dropout no-op as LoRA: PEFT forward path still uses dropout(x) for LoRA-family layers (peft_lora_layer.py:969); our PiSSA ignores cfg.dropout.

Did the prior review get it right?

Prior verdict: docs/audit/variants_review.md:1254 says “Potentially material bf16 initialization error... Save/load assumes loading into the same unmodified base model...” Confirmed. Reference code makes the save/load issue stronger because the PiSSA author flow saves the residual model separately.

Verdict: Partial

Core PiSSA math is close; persistence and bf16 identity are not reference-faithful.

DoRA

Reference impl ground-truth

PEFT DoRA stores a trainable magnitude vector initialized to ||W + scale*BA|| and in forward detaches the denominator norm:

docs/refs/peft_lora_dora.py:86 def get_weight_norm(self, weight, lora_weight, scaling, adapter_name: Optional[str] = None)
docs/refs/peft_lora_dora.py:89 weight = weight + scaling * lora_weight
docs/refs/peft_lora_dora.py:90 weight_norm = torch.linalg.norm(weight, dim=1).to(weight.dtype)
docs/refs/peft_lora_dora.py:130 self.weight = nn.Parameter(weight_norm, requires_grad=True)
docs/refs/peft_lora_dora.py:143 weight_norm = self.get_weight_norm(
docs/refs/peft_lora_dora.py:152 weight_norm = weight_norm.detach()
docs/refs/peft_lora_dora.py:165 result_dora = (mag_norm_scale - 1) * base_result + mag_norm_scale * lora_result * scaling

Our code

src/lora_lite/variants/dora.py:36 "lora_A": ParamSpec((cfg.r, d_in), init="kaiming", trainable=True),
src/lora_lite/variants/dora.py:37 "lora_B": ParamSpec((d_out, cfg.r), init="zeros", trainable=True),
src/lora_lite/variants/dora.py:39 "lora_m": ParamSpec((d_out,), init="zeros", trainable=True),
src/lora_lite/variants/dora.py:51 col_norm = W.norm(dim=1).to(layer.lora_m.dtype)
src/lora_lite/variants/dora.py:60 V = layer.weight + scale * BA
src/lora_lite/variants/dora.py:61 v_norm = V.norm(dim=1).clamp_min(1e-12)
src/lora_lite/variants/dora.py:66 return (layer.lora_m / v_norm) * combined

Diff

[OK-doc] Parameterization, magnitude shape, initialization, and forward value match PEFT for plain nn.Linear at dropout=0.
[OK-doc] Gradient differs from PEFT: reference detaches weight_norm (peft_lora_dora.py:152); our docstring explicitly says it differentiates through ||V||.
[NEW-BUG] Dropout no-op again: PEFT LoRA wrapper passes dropout(x) before DoRA variant invocation; our DoRA ignores cfg.dropout.
[STYLE] PEFT handles dequantization and Linear4bit for DoRA magnitude init; lora-lite raises on non-plain nn.Linear, documented.

Did the prior review get it right?

Prior verdict: docs/audit/variants_review.md:1277 says “Faithful for plain nn.Linear...” and docs/audit/variants_review.md:1305 flags the denominator-gradient concern. Confirmed, but direct PEFT reference upgrades the denominator difference from paper-only concern to concrete reference divergence.

Verdict: Partial

Forward is faithful; training gradients intentionally diverge from PEFT DoRA.

IA3

Reference impl ground-truth

PEFT IA3 uses different vector shapes and placement depending on whether the target is feedforward. Non-feedforward scales output; feedforward scales input before the base layer:

docs/refs/peft_ia3_layer.py:62 if self.is_feedforward:
docs/refs/peft_ia3_layer.py:63     weight = torch.randn((1, self.in_features))
docs/refs/peft_ia3_layer.py:65     weight = torch.randn((self.out_features, 1))
docs/refs/peft_ia3_layer.py:75 nn.init.constant_(self.ia3_l[adapter_name], 1.0)
docs/refs/peft_ia3_layer.py:177 if self.is_feedforward:
docs/refs/peft_ia3_layer.py:181     interm = (x * ia3_scaling).to(previous_dtype)
docs/refs/peft_ia3_layer.py:182     result = self.base_layer(interm, *args, **kwargs)
docs/refs/peft_ia3_layer.py:184 result = self.base_layer(x, *args, **kwargs)
docs/refs/peft_ia3_layer.py:186 result = (result * ia3_scaling).to(result_dtype)

Our code

src/lora_lite/variants/ia3.py:37 return {"lora_g": ParamSpec((d_out,), init="ones", trainable=True)}
src/lora_lite/variants/ia3.py:44 def forward(layer: nn.Linear, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
src/lora_lite/variants/ia3.py:45     return y * layer.lora_g

Diff

[OK-doc] For non-feedforward projections, output gating with a ones vector matches PEFT’s non-feedforward path.
[BUG] Feedforward IA3 is not reference-faithful. PEFT uses shape (1, in_features) and multiplies the input to the feedforward linear (peft_ia3_layer.py:177-182); our IA3 always uses (d_out,) and multiplies output. This matters for the paper’s FFN-intermediate gate.
[BUG] The IA3 docstring recommends up_proj (src/lora_lite/variants/ia3.py:13), but reference/paper-faithful FFN gating is better represented as input gating of down_proj in Llama/Qwen-style MLPs. Our variant cannot express that with its current forward-only hook.
[OK-doc] Qwen probe intentionally targets q_proj|v_proj, not IA3 paper placement: scripts/qwen_train_probe.py:29.
[NEW-BUG] IA3 default targeting is not IA3-like: LoraLiteConfig.target_roles=("reader","writer") can attach to many linears, and the variant has no feedforward role handling.

Did the prior review get it right?

Prior verdict: docs/audit/variants_review.md:1325 says target semantics are not paper-faithful and Qwen uses q_proj/v_proj; confirmed. The prior review under-specified the feedforward arithmetic: direct PEFT shows it is input-side scaling, not merely “choose the right target.”

Verdict: Partial

Projection gates are IA3-like; FFN IA3 is structurally missing.

HRA

Reference impl ground-truth

PEFT HRA stores hra_u as (in_features, r), initializes symmetrically for even rank, optionally Gram-Schmidts, builds Householder products, and applies W @ R without a zero gate:

docs/refs/peft_hra_layer.py:81 self.hra_u[adapter_name] = nn.Parameter(torch.empty(self.in_features, r), requires_grad=True)
docs/refs/peft_hra_layer.py:106 half_u = torch.zeros(shape[0], shape[1] // 2)
docs/refs/peft_hra_layer.py:108 self.hra_u[adapter_name] = nn.Parameter(torch.repeat_interleave(half_u, 2, dim=1))
docs/refs/peft_hra_layer.py:214 if apply_GS:
docs/refs/peft_hra_layer.py:222 weight = torch.eye(shape[0], device=opt_u.device, dtype=opt_u.dtype) - 2 * weight @ weight.t()
docs/refs/peft_hra_layer.py:225 opt_u = opt_u / opt_u.norm(dim=0)
docs/refs/peft_hra_layer.py:234 weight = weight - 2 * weight @ ui @ ui.t()
docs/refs/peft_hra_layer.py:258 new_weight = torch.mm(orig_weight, new_weight)

Original HRA/OFT repo initializes around an identity-like matrix and applies base forward plus transformation logic, not a scalar zero gate:

docs/refs/orig_hra_layer.py:52 self.oft_r[adapter_name] = nn.Parameter(
docs/refs/orig_hra_layer.py:53     torch.cat([torch.eye(r, device=weight.device, dtype=weight.dtype),
docs/refs/orig_hra_layer.py:58 nn.init.kaiming_uniform_(self.oft_r[adapter_name], a=1 / self.eps[adapter_name])

Our code

src/lora_lite/variants/hra.py:51 "lora_U": ParamSpec((cfg.r, d_in), init="kaiming", trainable=True),
src/lora_lite/variants/hra.py:53 "lora_gate": ParamSpec((), init="zeros", trainable=True),
src/lora_lite/variants/hra.py:65 for i in range(U.shape[0]):
src/lora_lite/variants/hra.py:67 sq = (u * u).sum().clamp_min(1e-12)
src/lora_lite/variants/hra.py:68 coeff = einsum(Rx, u, "... i, i -> ...") * (2.0 / sq)
src/lora_lite/variants/hra.py:69 Rx = Rx - coeff.unsqueeze(-1) * u
src/lora_lite/variants/hra.py:70 return x + layer.lora_gate * (Rx - x)

Diff

[OK-undoc] Shape is transposed relative to PEFT: our (r,d_in) is equivalent to PEFT (d_in,r) for the loop implementation.
[BUG] Zero scalar gate is not in PEFT or original HRA. It preserves identity, but causes step-0 zero gradient for lora_U. This was already documented in the new docstring and is confirmed by the reference.
[BUG] PEFT’s identity-preserving init for even r repeats columns in pairs (peft_hra_layer.py:106-108), so products cancel without killing gradients. Our implementation should use the symmetric pair init instead of a zero gate.
[BUG] PEFT has apply_GS orthogonalization (peft_hra_layer.py:214-222); our code has no equivalent. Without it, the “strict orthogonal HRA” reference mode is unavailable.
[STYLE] PEFT materializes new_weight = W @ R; our pre-hook computes R x then base computes W(Rx). Forward-equivalent for linear layers.

Did the prior review get it right?

Prior verdict: docs/audit/variants_review.md:1407 says “zero-gate formulation is materially different from paper HRA and creates an initial dead-U step.” Confirmed. Direct PEFT reference also reveals the better fix: symmetric repeated-column init.

Verdict: Buggy

The core Householder transform is present, but initialization is reference-incompatible and creates avoidable dead gradients.

DeLoRA

Reference impl ground-truth

PEFT DeLoRA stores A, B, trainable lambda, and a persistent per-input-channel w_norm; init uses Kaiming A, zero B by default, and forward multiplies x by w_norm, normalizes A/B, scales by lambda/r, then applies B:

docs/refs/peft_delora_layer.py:50 # Use persistent buffers so they are included in state_dict and saved.
docs/refs/peft_delora_layer.py:51 self.delora_w_norm = BufferDict({}, persistent=True)
docs/refs/peft_delora_layer.py:112 self.delora_A[adapter_name] = nn.Parameter(torch.empty(r, self.in_features))
docs/refs/peft_delora_layer.py:113 self.delora_B[adapter_name] = nn.Parameter(torch.empty(self.out_features, r))
docs/refs/peft_delora_layer.py:138 nn.init.kaiming_uniform_(self.delora_A[adapter_name], a=math.sqrt(5))
docs/refs/peft_delora_layer.py:139 nn.init.zeros_(self.delora_B[adapter_name])
docs/refs/peft_delora_layer.py:144 self.delora_lambda[adapter_name].data.fill_(float(delora_lambda))
docs/refs/peft_delora_layer.py:150 w_norm = torch.norm(w.data, dim=0).detach()
docs/refs/peft_delora_layer.py:250 h = nn.functional.linear(x_d * self.delora_w_norm[adapter], self.delora_A[adapter])
docs/refs/peft_delora_layer.py:255 scaling = (self.delora_lambda[adapter] / self.r[adapter]) / (An * Bn)

Original author fork differs from upstream PEFT: it initializes both A,B Kaiming and subtracts a frozen copy so identity holds even with nonzero lambda:

docs/refs/orig_delora.py:324 nn.init.kaiming_uniform_(self.delora_A.weight, a=math.sqrt(5))
docs/refs/orig_delora.py:325 nn.init.kaiming_uniform_(self.delora_B.weight, a=math.sqrt(5))
docs/refs/orig_delora.py:327 self.frozen_C.data = self.delora_A.weight.data
docs/refs/orig_delora.py:328 self.frozen_D.data = self.delora_B.weight.data
docs/refs/orig_delora.py:349 ABCD = delora_B_weight @ diag12 @ delora_A_weight
docs/refs/orig_delora.py:350 ABCD = ABCD - self.frozen_D @ diag34 @ self.frozen_C
docs/refs/orig_delora.py:353 Wnorm = self.weight.data.norm(dim=0)
docs/refs/orig_delora.py:354 ABCD = torch.mul(ABCD, Wnorm.unsqueeze(0))

Our code

src/lora_lite/variants/delora.py:48 "lora_A": ParamSpec((cfg.r, d_in), init="kaiming", trainable=True),
src/lora_lite/variants/delora.py:49 "lora_B": ParamSpec((d_out, cfg.r), init="kaiming", trainable=True),
src/lora_lite/variants/delora.py:51 (), init=lambda t: t.fill_(lam0), trainable=True
src/lora_lite/variants/delora.py:63 layer.lora_wnorm.data.fill_(W.norm().item())
src/lora_lite/variants/delora.py:70 A = F.normalize(layer.lora_A, dim=1)
src/lora_lite/variants/delora.py:71 B = F.normalize(layer.lora_B, dim=0)
src/lora_lite/variants/delora.py:72 scale = layer.lora_lambda * layer.lora_wnorm / cfg.r
src/lora_lite/variants/delora.py:75 return y + scale * delta

Diff

[BUG] Against upstream PEFT, B init differs: PEFT default uses B=0 (peft_delora_layer.py:139), our code uses Kaiming B.
[BUG] Against original author repo, our code omits the frozen-copy subtraction (orig_delora.py:327-350). This confirms the prior “lambda0=0 dead A/B vs lambda0>0 non-identity” finding.
[NEW-BUG] Our lora_wnorm is a scalar Frobenius norm (W.norm().item()), but both PEFT and author repo use per-input-channel norm norm(dim=0) (peft_delora_layer.py:150, orig_delora.py:353-354). Failure mode: the DeLoRA update has the wrong feature-wise scaling and cannot match reference behavior except in degenerate equal-column-norm cases.
[NEW-BUG] lora_wnorm is registered as a non-trainable parameter because ParamSpec.make always returns nn.Parameter; PEFT uses a persistent buffer (peft_delora_layer.py:50-51). Optimizers will usually ignore it due to requires_grad=False, but state semantics are not reference-clean.
[OK-doc] lambda0=0 identity and dead first-step A/B gradients are documented in our docstring.
[NEW-BUG] Dropout no-op: PEFT DeLoRA has delora_dropout and applies it at peft_delora_layer.py:246; our code ignores cfg.dropout.

Did the prior review get it right?

Prior verdict: docs/audit/variants_review.md:1471 says “no Eq.9 frozen-copy init; zero-lambda dead A/B first step; nonzero-lambda breaks identity.” Confirmed. Direct reference comparison adds the per-input-channel norm bug.

Verdict: Buggy

It is DeLoRA-shaped, but not faithful to either upstream PEFT or the original author implementation.

Aggregate

variant	prior verdict	new verdict	new bugs found	doc gaps
lora	Faithful	Faithful-with-doc-gap	`cfg.dropout` silently ignored	LoRA docstring should mention dropout unsupported
pissa	Partial	Partial	dropout ignored	parameter scaling differs from PEFT; save/load caveat should cite residual-base requirement
dora	Faithful	Partial	dropout ignored	PEFT denominator detach is documented, but verdict should say reference-gradient divergence
ia3	Partial	Partial	FFN gate requires input-side scaling; doc recommends wrong `up_proj` stand-in	document non-feedforward-only faithfulness
hra	Partial	Buggy	none beyond prior, but PEFT reveals symmetric init fix	document lack of `apply_GS`
delora	Partial	Buggy	wrong scalar `W` norm; dropout ignored; `w_norm` should be buffer-like	distinguish PEFT upstream vs original-author initialization

What To Fix Next

Fix DeLoRA scaling: replace scalar W.norm() with per-input-channel W.norm(dim=0), and decide whether to follow upstream PEFT B=0 or author frozen-copy init.
Replace HRA zero gate with PEFT-style symmetric repeated-column initialization so identity holds without dead U gradients.
Fix IA3 FFN support: add input-side gating for feedforward targets and update docs from up_proj to the correct down_proj/FFN-intermediate semantics.
Make PiSSA persistence explicit: either save/load the residual base mutation or refuse/load-warn unless the exact same base is guaranteed.
Either implement adapter dropout across LoRA-family variants or remove/rename cfg.dropout so users do not get a silent no-op.

[done] end_turn

85 KiB Raw Blame History Unescape Escape

Per-Variant Paper-Faithfulness Audit V2

LoRA

PiSSA

DoRA

IA3

HRA

DeLoRA

Aggregate

What To Fix Next

85 KiB

Raw Blame History