feat: ia3 variant, real bnb 4bit/8bit smoke, dev guide split, user-only readme

2026-06-27 17:16:12 +08:00 · 2026-04-26 17:49:17 +08:00
parent f2d9021511
commit 699fde31bf
11 changed files with 216 additions and 174 deletions
@@ -0,0 +1,81 @@
+# Developer guide
+
+This is the implementation note for people adding adapter variants. The README is only for prospective users.
+
+## Design principles
+
+- Variants own adapter math.
+- The runtime owns targeting, parameter attachment, hooks, and save/load.
+- Adapter parameters live directly on target layers as `lora_*` parameters.
+- Save/load uses normal full-path `state_dict()` keys filtered by `"lora_"`.
+- Fail loudly on unsupported weight semantics. No silent quantized PiSSA or merge fallback.
+
+## Variant contract
+
+A variant is a registered class with a small static interface:
+
+```python
+@register
+class MyVariant:
+    name = "myvariant"
+
+    @staticmethod
+    def param_specs(d_in, d_out, cfg) -> dict[str, ParamSpec]:
+        return {"lora_A": ParamSpec((cfg.r, d_in), init="kaiming")}
+
+    @staticmethod
+    def init(layer, cfg) -> None:
+        ...
+
+    @staticmethod
+    def forward(layer, x, y):
+        return y_new
+```
+
+Pseudocode for the runtime:
+
+```python
+def attach(model, cfg):
+    targets ← find_linear_like_modules(model, cfg)
+    freeze(model.parameters())
+    for name, layer in targets:
+        layer.lora_* ← variant.param_specs(layer, cfg)
+        variant.init(layer, cfg)
+        hook(layer, lambda x, y: variant.forward(layer, x, y))
+
+def save(model, path):
+    torch.save({"cfg": cfg, "state": state_dict_keys_containing("lora_")}, path)
+```
+
+## Data-calibrated init
+
+LoRA, PiSSA, DeLoRA, and IA3 only use `layer.weight` or identity constants for init.
+
+Variants that need data, e.g. AntiPaSTO, LoRA-GA, or activation-aware SVD, should keep dataloaders out of `cfg` so adapter checkpoints stay serializable:
+
+```python
+ll.attach(model, cfg, calibration_data=calib)
+```
+
+Activation-aware variants implement `group_init(model, targets, cfg, calibration_data)`. The variant may add temporary hooks, run calibration batches, remove hooks, then write `lora_*` params. `load()` should not require calibration data.
+
+## Current limitations
+
+| Feature | Current choice |
+|---|---|
+| merge/unmerge | reload the base model if vanilla weights are needed |
+| multiple named adapters | one variant per `attach()` |
+| mixed-adapter batches | out of scope until needed |
+| quantized PiSSA | fail-fast; explicit dequantize/requantize required |
+| AdaLoRA rank scheduling | needs a future `Variant.on_step(step)` hook |
+| ReFT-style interventions | likely a sibling module or different hook site |
+
+## Adapter roadmap
+
+| Variant | Fit to current runtime | Next invariant |
+|---|---|---|
+| IA3 | Done. Output gate `y * g`, identity at `g=1`. | Qwen proof task 79. |
+| DoRA | Likely additive hook for fp layers; quantized norm semantics need care. | fp identity, perturb, save/load, loss drop. |
+| SSVD / PiSSA-family | Fits weight-SVD init path. | reconstruction/identity invariant plus train proof. |
+| HRA / OFT / ROAD | Interesting, but weight-transform semantics need clearer hook-only formulation. | pseudocode first, then rotation/non-dead-code invariant. |
+| S-steer / AntiPaSTO | Should use `group_init` and activation evidence. | calibration consumed, hooks removed, load works without calibration. |