mirror of https://github.com/wassname/lora-lite.git synced 2026-06-27 16:30:44 +08:00

T

copilot 185eb29c70 fix v2 review bugs + add EVA, AntiPaSTO

DeLoRA: per-input-channel wnorm buffer (not scalar Parameter), forward
matches peft (x*wnorm @ A.T then per-rank scale (lambda/r)/(An*Bn)).
Smoke: 89.7% loss drop (was 35.8%).

HRA: symmetric repeated-column init (PEFT-style) instead of zero gate.
Adjacent Householder pairs cancel exactly so R=I at t=0, and U receives
gradient from step 0 (no dead-grad). Even r required.

IA3: split into two variants. ia3 stays output-side (k_proj/v_proj);
new ia3_ff is input-side (down_proj/fc2), matching peft is_feedforward.

Config: dropout field removed (never honored by any variant).

PiSSA: adapter.save records base-weight fingerprint per target;
adapter.load recomputes init then verifies fingerprint -> fails loud
when reloaded onto a different base.

EVA (new): data-driven init via group_init + calibration_data. Top-r
right singular vectors of pooled layer-input activations -> lora_A
(buffer, frozen); only lora_B trains. Stress-tests group_init API.

AntiPaSTO (new): SVD steering with frozen U,S,Vh,W_res and learnable
delta_s (per-singular-value bias) + rot_T (block-diagonal Cayley
rotation on V or U). Lite port of antipasto3 SVD adapter.

ParamSpec: as_buffer field + make_tensor() for buffer registration.
adapter.attach honors as_buffer with register_buffer; detach cleans
both _parameters and _buffers.

Smoke covers all 8 variants: identity at t=0, save/load round-trip,
gradient-driven loss drop. EVA gets dedicated test for calibration
data path. ALL PASS including bnb 4/8-bit path.

2026-04-26 19:41:59 +08:00

.github/workflows

ci: add publishable check workflow

2026-04-26 17:09:47 +08:00

docs

Add reference-impl URLs to variant docstrings + V2 external review

2026-04-26 19:27:47 +08:00

scripts

feat(hra): add Householder Reflection Adaptation, hook-only/bnb-friendly + Qwen proof

2026-04-26 17:58:56 +08:00

src/lora_lite

fix v2 review bugs + add EVA, AntiPaSTO

2026-04-26 19:41:59 +08:00

tests

fix v2 review bugs + add EVA, AntiPaSTO

2026-04-26 19:41:59 +08:00

.gitignore

Verify all variants on bnb 4bit/8bit; HRA paper-faithful rewrite

2026-04-26 18:08:06 +08:00

justfile

feat(dora): add weight-decomposed LoRA variant for fp layers

2026-04-26 17:53:33 +08:00

pyproject.toml

feat: ia3 variant, real bnb 4bit/8bit smoke, dev guide split, user-only readme

2026-04-26 17:49:17 +08:00

README.md

feat(hra): add Householder Reflection Adaptation, hook-only/bnb-friendly + Qwen proof

2026-04-26 17:58:56 +08:00

uv.lock

feat: ia3 variant, real bnb 4bit/8bit smoke, dev guide split, user-only readme

2026-04-26 17:49:17 +08:00

README.md

lora-lite

Hackable PyTorch adapters for LoRA-family and small PEFT experiments.

lora-lite uses forward hooks instead of module replacement. Adapter parameters are plain nn.Parameters on the target layer, e.g. model.layers[5].self_attn.q_proj.lora_A.

Install

pip install -e git+https://github.com/wassname/lora-lite.git#egg=lora-lite

Quickstart

import torch, lora_lite as ll

model = MyTransformer()
cfg = ll.LoraLiteConfig(variant="lora", r=8, alpha=16, dtype=torch.bfloat16)
ll.attach(model, cfg)

opt = torch.optim.AdamW([p for p in model.parameters() if p.requires_grad], lr=1e-4)
# train...

ll.save(model, "adapter.pt")
ll.detach(model)
ll.load(model, "adapter.pt")

Does it work?

just check       # pytest + smoke + package build + metadata check
just bnb-smoke   # required CUDA bitsandbytes 4bit/8bit smoke
just qwen-probe  # Qwen/Qwen3-0.6B train/save-load probe

See docs/spec/20260426_lora_lite_plan.md for verification history and exact results.

Variants

Variant	Support	Notes
LoRA	yes	additive low-rank adapter
PiSSA	yes, fp only	mutates `weight` into `W_res`; quantized PiSSA intentionally fails
DeLoRA	yes	normalized additive adapter with learned scalar
IA3	yes	output gate initialized to ones
DoRA	yes, fp only	reads dense `weight` for column-norm; quantized DoRA fails loudly
HRA	yes	output-side Householder reflection with identity gate; works on bnb
SSVD / OFT / ROAD	no	planned
S-steer / AntiPaSTO	no	should use data-calibrated `group_init`, not plain LoRA tests

Targeting

By default, lora-lite targets linear-like modules with in_features, out_features, and weight, excluding lm_head and embed_tokens.

Useful LoraLiteConfig fields:

target_roles: subset of ("reader", "writer", "inner"); () means all.
target_names: regex includes.
exclude_names: regex excludes.
layers: layer indices, matching .layers.<idx>. in module names.

This structural targeting is why LoRA, DeLoRA, and IA3 can run on bnb-style Linear4bit/Linear8bitLt modules. PiSSA is different because it edits the base weight.

Save format

Adapters are just:

torch.save({"cfg": cfg.to_dict(), "state": lora_state_dict}, "adapter.pt")

lora_state_dict contains full-path keys with "lora_" in the name. Missing or unexpected adapter keys fail on load.

Developer docs

See docs/developer_guide.md for the variant API, data-calibrated init, and adapter roadmap.

Citation

@misc{wassname2026loralite,
  title = {LoRA-Lite: A Hackable Adapter Library for Research},
  author = {Michael J. Clark},
  year = {2026},
  url = {https://github.com/wassname/lora-lite/}
}