Developer guide

This is the implementation note for people adding adapter variants. The README is only for prospective users.

Design principles

Variants own adapter math.
The runtime owns targeting, parameter attachment, hooks, and save/load.
Adapter parameters live directly on target layers as lora_* parameters.
Save/load uses normal full-path state_dict() keys filtered by "lora_".
Fail loudly on unsupported weight semantics. No silent quantized PiSSA or merge fallback.

Variant contract

A variant is a registered class with a small static interface:

@register
class MyVariant:
    name = "myvariant"

    @staticmethod
    def param_specs(d_in, d_out, cfg) -> dict[str, ParamSpec]:
        return {"lora_A": ParamSpec((cfg.r, d_in), init="kaiming")}

    @staticmethod
    def init(layer, cfg) -> None:
        ...

    @staticmethod
    def forward(layer, x, y):
        return y_new

Pseudocode for the runtime:

def attach(model, cfg):
    targets ← find_linear_like_modules(model, cfg)
    freeze(model.parameters())
    for name, layer in targets:
        layer.lora_* ← variant.param_specs(layer, cfg)
        variant.init(layer, cfg)
        hook(layer, lambda x, y: variant.forward(layer, x, y))

def save(model, path):
    torch.save({"cfg": cfg, "state": state_dict_keys_containing("lora_")}, path)

Data-calibrated init

LoRA, PiSSA, DeLoRA, and IA3 only use layer.weight or identity constants for init.

Variants that need data, e.g. AntiPaSTO, LoRA-GA, or activation-aware SVD, should keep dataloaders out of cfg so adapter checkpoints stay serializable:

ll.attach(model, cfg, calibration_data=calib)

Activation-aware variants implement group_init(model, targets, cfg, calibration_data). The variant may add temporary hooks, run calibration batches, remove hooks, then write lora_* params. load() should not require calibration data.

Current limitations

Feature	Current choice
merge/unmerge	reload the base model if vanilla weights are needed
multiple named adapters	one variant per `attach()`
mixed-adapter batches	out of scope until needed
quantized PiSSA	fail-fast; explicit dequantize/requantize required
AdaLoRA rank scheduling	needs a future `Variant.on_step(step)` hook
ReFT-style interventions	likely a sibling module or different hook site

Adapter roadmap

Variant	Fit to current runtime	Status
LoRA	Hook-only additive low-rank.	Done. Tested.
PiSSA	Mutates `layer.weight` into `W_res`; identity via SVD round-trip.	Done. fp-only. Tested.
DeLoRA	Per-input-channel weight-norm scale, per-rank A/B normalization, learned `lambda`.	Done. Tested.
IA3 / IA3_FF	Output gate (k/v) and input gate (down_proj) variants, init to ones.	Done. Tested.
DoRA	Reads dense `weight` for `
HRA	Householder product applied via `forward_input` pre-hook; bnb-friendly.	Done. Tested.
EVA	LoRA forward; `lora_A` init from PCA on calibration activations via `group_init`.	Done. fp-only. Tested.
AntiPaSTO	Top-r weight SVD, learnable singular-value deltas + block-diagonal Cayley rotation.	Done. fp-only. Tested.
SSVD	Could fit the weight-SVD init path.	Planned.
OFT / ROAD	Block-diagonal rotations; needs clearer hook-only formulation.	Planned.

3.4 KiB Raw Blame History