copilot 55757e829d fix V3 review must-fixes: DoRA bias passthrough + EVA load path
V3 external review (docs/audit/variants_review_v3.md, 97KB) found 3
must-fix bugs.

DoRA: bias was being scaled by m/||V|| because we operated on the full
base layer output. Now subtract bias before normalization, add back
after. Matches peft DoRA exactly (docs/refs/peft_lora_dora.py:157-161).
New smoke dora_bias_smoke verifies identity at t=0 with bias=True.

EVA load: adapter.load() called attach() which called group_init() which
required calibration_data and raised. Added _skip_group_init flag to
attach(); load() passes it. EVA group_init still raises loudly when
called directly without data. New smoke verifies save+load WITHOUT
calibration data on load path.

Also tightened EVA error message.

Smoke now covers 8 variants + EVA roundtrip + DoRA-bias roundtrip + bnb
4/8-bit. ALL PASS.

V3 nice-to-haves (PiSSA scaling, AntiPaSTO init choice, stale GH refs)
deferred -- documented as intentional in module docstrings.
2026-04-26 19:50:48 +08:00

lora-lite

Hackable PyTorch adapters for LoRA-family and small PEFT experiments.

lora-lite uses forward hooks instead of module replacement. Adapter parameters are plain nn.Parameters on the target layer, e.g. model.layers[5].self_attn.q_proj.lora_A.

Install

pip install -e git+https://github.com/wassname/lora-lite.git#egg=lora-lite

Quickstart

import torch, lora_lite as ll

model = MyTransformer()
cfg = ll.LoraLiteConfig(variant="lora", r=8, alpha=16, dtype=torch.bfloat16)
ll.attach(model, cfg)

opt = torch.optim.AdamW([p for p in model.parameters() if p.requires_grad], lr=1e-4)
# train...

ll.save(model, "adapter.pt")
ll.detach(model)
ll.load(model, "adapter.pt")

Does it work?

just check       # pytest + smoke + package build + metadata check
just bnb-smoke   # required CUDA bitsandbytes 4bit/8bit smoke
just qwen-probe  # Qwen/Qwen3-0.6B train/save-load probe

See docs/spec/20260426_lora_lite_plan.md for verification history and exact results.

Variants

Variant Support Notes
LoRA yes additive low-rank adapter
PiSSA yes, fp only mutates weight into W_res; quantized PiSSA intentionally fails
DeLoRA yes normalized additive adapter with learned scalar
IA3 yes output gate initialized to ones
DoRA yes, fp only reads dense weight for column-norm; quantized DoRA fails loudly
HRA yes output-side Householder reflection with identity gate; works on bnb
SSVD / OFT / ROAD no planned
S-steer / AntiPaSTO no should use data-calibrated group_init, not plain LoRA tests

Targeting

By default, lora-lite targets linear-like modules with in_features, out_features, and weight, excluding lm_head and embed_tokens.

Useful LoraLiteConfig fields:

  • target_roles: subset of ("reader", "writer", "inner"); () means all.
  • target_names: regex includes.
  • exclude_names: regex excludes.
  • layers: layer indices, matching .layers.<idx>. in module names.

This structural targeting is why LoRA, DeLoRA, and IA3 can run on bnb-style Linear4bit/Linear8bitLt modules. PiSSA is different because it edits the base weight.

Save format

Adapters are just:

torch.save({"cfg": cfg.to_dict(), "state": lora_state_dict}, "adapter.pt")

lora_state_dict contains full-path keys with "lora_" in the name. Missing or unexpected adapter keys fail on load.

Developer docs

See docs/developer_guide.md for the variant API, data-calibrated init, and adapter roadmap.

Citation

@misc{wassname2026loralite,
  title = {LoRA-Lite: A Hackable Adapter Library for Research},
  author = {Michael J. Clark},
  year = {2026},
  url = {https://github.com/wassname/lora-lite/}
}
S
Description
A hackable, single-file-per-variant LoRA library built on PyTorch forward hooks.
Readme 1.1 MiB
Languages
Python 94%
Just 6%