Files
lora-lite/docs/developer_guide.md
T
wassname bb8887e66c tidy
2026-04-27 07:12:56 +08:00

3.4 KiB

Developer guide

This is the implementation note for people adding adapter variants. The README is only for prospective users.

Design principles

  • Variants own adapter math.
  • The runtime owns targeting, parameter attachment, hooks, and save/load.
  • Adapter parameters live directly on target layers as lora_* parameters.
  • Save/load uses normal full-path state_dict() keys filtered by "lora_".
  • Fail loudly on unsupported weight semantics. No silent quantized PiSSA or merge fallback.

Variant contract

A variant is a registered class with a small static interface:

@register
class MyVariant:
    name = "myvariant"

    @staticmethod
    def param_specs(d_in, d_out, cfg) -> dict[str, ParamSpec]:
        return {"lora_A": ParamSpec((cfg.r, d_in), init="kaiming")}

    @staticmethod
    def init(layer, cfg) -> None:
        ...

    @staticmethod
    def forward(layer, x, y):
        return y_new

Pseudocode for the runtime:

def attach(model, cfg):
    targets  find_linear_like_modules(model, cfg)
    freeze(model.parameters())
    for name, layer in targets:
        layer.lora_*  variant.param_specs(layer, cfg)
        variant.init(layer, cfg)
        hook(layer, lambda x, y: variant.forward(layer, x, y))

def save(model, path):
    torch.save({"cfg": cfg, "state": state_dict_keys_containing("lora_")}, path)

Data-calibrated init

LoRA, PiSSA, DeLoRA, and IA3 only use layer.weight or identity constants for init.

Variants that need data, e.g. AntiPaSTO, LoRA-GA, or activation-aware SVD, should keep dataloaders out of cfg so adapter checkpoints stay serializable:

ll.attach(model, cfg, calibration_data=calib)

Activation-aware variants implement group_init(model, targets, cfg, calibration_data). The variant may add temporary hooks, run calibration batches, remove hooks, then write lora_* params. load() should not require calibration data.

Current limitations

Feature Current choice
merge/unmerge reload the base model if vanilla weights are needed
multiple named adapters one variant per attach()
mixed-adapter batches out of scope until needed
quantized PiSSA fail-fast; explicit dequantize/requantize required
AdaLoRA rank scheduling needs a future Variant.on_step(step) hook
ReFT-style interventions likely a sibling module or different hook site

Adapter roadmap

Variant Fit to current runtime Status
LoRA Hook-only additive low-rank. Done. Tested.
PiSSA Mutates layer.weight into W_res; identity via SVD round-trip. Done. fp-only. Tested.
DeLoRA Per-input-channel weight-norm scale, per-rank A/B normalization, learned lambda. Done. Tested.
IA3 / IA3_FF Output gate (k/v) and input gate (down_proj) variants, init to ones. Done. Tested.
DoRA Reads dense weight for `
HRA Householder product applied via forward_input pre-hook; bnb-friendly. Done. Tested.
EVA LoRA forward; lora_A init from PCA on calibration activations via group_init. Done. fp-only. Tested.
AntiPaSTO Top-r weight SVD, learnable singular-value deltas + block-diagonal Cayley rotation. Done. fp-only. Tested.
SSVD Could fit the weight-SVD init path. Planned.
OFT / ROAD Block-diagonal rotations; needs clearer hook-only formulation. Planned.