DeLoRA: per-input-channel wnorm buffer (not scalar Parameter), forward matches peft (x*wnorm @ A.T then per-rank scale (lambda/r)/(An*Bn)). Smoke: 89.7% loss drop (was 35.8%). HRA: symmetric repeated-column init (PEFT-style) instead of zero gate. Adjacent Householder pairs cancel exactly so R=I at t=0, and U receives gradient from step 0 (no dead-grad). Even r required. IA3: split into two variants. ia3 stays output-side (k_proj/v_proj); new ia3_ff is input-side (down_proj/fc2), matching peft is_feedforward. Config: dropout field removed (never honored by any variant). PiSSA: adapter.save records base-weight fingerprint per target; adapter.load recomputes init then verifies fingerprint -> fails loud when reloaded onto a different base. EVA (new): data-driven init via group_init + calibration_data. Top-r right singular vectors of pooled layer-input activations -> lora_A (buffer, frozen); only lora_B trains. Stress-tests group_init API. AntiPaSTO (new): SVD steering with frozen U,S,Vh,W_res and learnable delta_s (per-singular-value bias) + rot_T (block-diagonal Cayley rotation on V or U). Lite port of antipasto3 SVD adapter. ParamSpec: as_buffer field + make_tensor() for buffer registration. adapter.attach honors as_buffer with register_buffer; detach cleans both _parameters and _buffers. Smoke covers all 8 variants: identity at t=0, save/load round-trip, gradient-driven loss drop. EVA gets dedicated test for calibration data path. ALL PASS including bnb 4/8-bit path.
lora-lite
Hackable PyTorch adapters for LoRA-family and small PEFT experiments.
lora-lite uses forward hooks instead of module replacement. Adapter parameters are plain nn.Parameters on the target layer, e.g. model.layers[5].self_attn.q_proj.lora_A.
Install
pip install -e git+https://github.com/wassname/lora-lite.git#egg=lora-lite
Quickstart
import torch, lora_lite as ll
model = MyTransformer()
cfg = ll.LoraLiteConfig(variant="lora", r=8, alpha=16, dtype=torch.bfloat16)
ll.attach(model, cfg)
opt = torch.optim.AdamW([p for p in model.parameters() if p.requires_grad], lr=1e-4)
# train...
ll.save(model, "adapter.pt")
ll.detach(model)
ll.load(model, "adapter.pt")
Does it work?
just check # pytest + smoke + package build + metadata check
just bnb-smoke # required CUDA bitsandbytes 4bit/8bit smoke
just qwen-probe # Qwen/Qwen3-0.6B train/save-load probe
See docs/spec/20260426_lora_lite_plan.md for verification history and exact results.
Variants
| Variant | Support | Notes |
|---|---|---|
| LoRA | yes | additive low-rank adapter |
| PiSSA | yes, fp only | mutates weight into W_res; quantized PiSSA intentionally fails |
| DeLoRA | yes | normalized additive adapter with learned scalar |
| IA3 | yes | output gate initialized to ones |
| DoRA | yes, fp only | reads dense weight for column-norm; quantized DoRA fails loudly |
| HRA | yes | output-side Householder reflection with identity gate; works on bnb |
| SSVD / OFT / ROAD | no | planned |
| S-steer / AntiPaSTO | no | should use data-calibrated group_init, not plain LoRA tests |
Targeting
By default, lora-lite targets linear-like modules with in_features, out_features, and weight, excluding lm_head and embed_tokens.
Useful LoraLiteConfig fields:
target_roles: subset of("reader", "writer", "inner");()means all.target_names: regex includes.exclude_names: regex excludes.layers: layer indices, matching.layers.<idx>.in module names.
This structural targeting is why LoRA, DeLoRA, and IA3 can run on bnb-style Linear4bit/Linear8bitLt modules. PiSSA is different because it edits the base weight.
Save format
Adapters are just:
torch.save({"cfg": cfg.to_dict(), "state": lora_state_dict}, "adapter.pt")
lora_state_dict contains full-path keys with "lora_" in the name. Missing or unexpected adapter keys fail on load.
Developer docs
See docs/developer_guide.md for the variant API, data-calibrated init, and adapter roadmap.
Citation
@misc{wassname2026loralite,
title = {LoRA-Lite: A Hackable Adapter Library for Research},
author = {Michael J. Clark},
year = {2026},
url = {https://github.com/wassname/lora-lite/}
}