wassname/lora-lite

Fork 0

mirror of https://github.com/wassname/lora-lite.git synced 2026-06-27 15:15:55 +08:00

Files

T

wassname f2d9021511 ci: add publishable check workflow

2026-04-26 17:09:47 +08:00

12 KiB

Raw Blame History

lora-lite plan and status

Goal

Build a small, hackable LoRA-family adapter library for research experiments.

The core bet is that adapter variants should own the relationship between (x, layer.weight, layer.lora_*) and the layer output, while the library only handles targeting, parameter attachment, hooks, and save/load.

Non-goals

No PEFT compatibility layer.
No module replacement.
No merge/unmerge.
No multiple named adapters per layer.
No backward compatibility promises.
No silent fallbacks.

Design constraints

Adapter params are attached directly to target layers as lora_* parameters.
Save/load uses normal state_dict() keys, filtered by "lora_".
Forward hooks return the layer's new output, not just a delta.
Targeting is structural: modules with in_features, out_features, and weight are linear-like.
LoRA/DeLoRA support bnb-style 4/8-bit forward paths because the quantized base layer computes y; the hook only adds adapter math.
PiSSA is fp-only in v1 because it mutates layer.weight into W_res.
Data-calibrated variants use group_init(model, targets, cfg, calibration_data); dataloaders stay out of cfg so checkpoints are serializable.

Implemented v0.0.1

Area	Status	Evidence
`LoraLiteConfig`	done	`src/lora_lite/config.py`
Variant registry + `ParamSpec`	done	`src/lora_lite/variant.py`
Structural target discovery	done	`src/lora_lite/target.py`
`attach` / `detach` / `save` / `load`	done	`src/lora_lite/adapter.py`
LoRA	done	`src/lora_lite/variants/lora.py`
PiSSA	done, fp-only	`src/lora_lite/variants/pissa.py`
DeLoRA	done	`src/lora_lite/variants/delora.py`
Smoke tests	done	`tests/smoke.py`
bnb minimal forward smoke	done	`Linear8bitLt` and `Linear4bit` pass on CUDA

Current smoke evidence

Last verified log: /home/wassname/.cache/agent-tmp/lora_lite_smoke_after_review.log

Check	Result
LoRA identity	`0.000e+00`
LoRA loss drop	`6.1%`
PiSSA identity	`1.550e-06`
PiSSA loss drop	`11.5%`
DeLoRA identity	`0.000e+00`
DeLoRA loss drop	`93.4%`
fake non-`nn.Linear` target	attaches, identity `0.000e+00`, grad nonzero
bnb `Linear8bitLt`	identity `0.000e+00`, grad nonzero
bnb `Linear4bit`	identity `0.000e+00`, grad nonzero

2026-04-26 testing proof pass

Goal: upgrade from smoke-tested sketch to evidence that the current PEFT-lite interface trains on both toy models and a real HF Qwen model.

Scope

In:

Pytest coverage for LoRA, PiSSA, and DeLoRA correctness invariants.
A real Qwen/Qwen3-0.6B probe that trains each current variant on layer-0 q_proj and v_proj.
Repeatable just recipes and workspace-local logs/artifacts.

Out:

Benchmark claims.
Quantized Qwen proof for PiSSA. PiSSA remains fp-only because it mutates weight.
Full default-target training over every Qwen layer.

Requirements and evidence

Requirement	Distinguishing check	Evidence
R1: toy tests catch skipped targets/hooks	Perturb only `lora_*`; output must change. Missing target must raise.	`just test` -> `8 passed in 2.43s` in `logs/pytest.log`
R2: toy tests catch base-gradient leakage	After backward, all non-`lora_*` grads are `None`; all trainable names contain `lora_`.	`just test` -> `8 passed in 2.43s`
R3: save/load is exact for adapters	Saved key set equals full-path `lora_` state; reload tensors equal; missing/extra `lora_` keys raise.	`just test` -> `8 passed in 2.43s`
R4: current variants train on tiny task	28 TinyModel targets; non-`lora_*` grads stay `None`; 20-step loss drop >5%.	`just smoke` -> LoRA 6.1%, PiSSA 11.5%, DeLoRA 93.4%
R5: current variants train on real Qwen	Fresh Qwen per variant; exact targets are layer-0 `q_proj`/`v_proj`; perturb >0; lossN < loss0; reload err < tol.	`pueue` task 70, `logs/qwen_probe.log`, all probes pass
R6: cold review cannot explain evidence under silent failure	External review findings fixed, then fresh-eyes subagent says PASS.	`docs/spec/20260426_code_review.md`

Qwen proof table

Command:

pueue add --immediate --follow --label "why: verify warning-free current Qwen probe after dtype API cleanup; resolve: same pass table proves current script" --working-directory "$PWD" --priority 1 -- just qwen-probe

Result from task 70:

variant	targets	trainable	id_err	perturb	loss0	lossN	drop%	grad	dθ	adapter
lora	2	20480	0	0.375	5.25	3.131	40.36	1.432	4.262	`outputs/qwen_train_probe/lora_adapter.pt`
pissa	2	20480	0.3125	0.75	5.25	3.629	30.88	6.124	4.381	`outputs/qwen_train_probe/pissa_adapter.pt`
delora	2	20482	0.375	0.4062	5.246	5.166	1.537	0.04778	8.196	`outputs/qwen_train_probe/delora_adapter.pt`

Failure-mode interpretation:

If targeting silently skipped, exact target-set assertion would fail before training.
If hooks were attached but dead, perturb delta would be 0.
If base params trained, the non-lora_* gradient check would fail.
If adapter grads were absent, grad or dθ would be 0/non-finite.
If save/load were broken, adapter tensor equality or reload logit error would fail.

2026-04-26 publishable workflow pass

Goal: make the repo behave like a small buildable library without adding framework surface area.

Scope

In:

Keep minimal functional integration tests as the default proof path.
Add a GitHub CI hook for fast tests and package build.
Add a pueue recipe for the larger Qwen proof.
Tighten README structure so install, quickstart, core idea, tests, and status are easy to find.

Out:

PyPI publishing token/workflow. Publishing should wait until the license decision is explicit.
Implementing every named adapter variant in one pass. That would add complexity faster than tests can explain it.

Requirements and evidence

Requirement	Distinguishing check	Evidence
R7: fast CI catches broken tests/builds	`just check` must run pytest, smoke, `uv build`, and `twine check`; a broken test, wheel, sdist, or README metadata fails the hook.	`just check` -> pytest `8 passed in 9.53s`, smoke all pass, wheel/sdist built, `twine check dist/*` passed
R8: large proof is queued, not hidden in CI	`just qwen-queue` must create a pueue task in the repo cwd with why/resolve label and intended Qwen command.	`just qwen-queue && pueue status` -> task 74 queued at `/media/wassname/SGIronWolf/projects5/2026/lora-lite` with `just qwen-probe lora pissa delora 16`
R9: README is publishable enough to judge	Reader sees install, quickstart, pseudocode core, testing commands, proof caveat before variant wishlist.	`README.md` reordered and human note removed
R10: variant roadmap buys simplicity	Next variant is ranked by fit to current hook contract; non-hook variants are deferred rather than half-supported.	this section

Fresh review first blocked on weak qwen-queue evidence and README citation/comment junk. Fixes: queued real pueue task 74, added twine check, fenced citation, removed the stray README note. Final fresh review verdict: PASS.

Adapter roadmap, ranked by simplicity

Variant	Why it fits or waits	Next check
IA3	Multiplicative vector on activations. Probably the smallest new file and no base-weight mutation.	Identity with ones, perturb changes output, loss drops, save/load exact.
DoRA	Fits additive hook for fp layers; bnb norm handling must be explicit or fail-fast.	fp smoke first; quantized proof only after norm semantics are obvious.
SSVD / PiSSA-family	Fits current `weight`-SVD pattern and teaches the SVD adapter path.	Reconstruction/identity invariant plus train proof.
HRA / OFT / ROAD	Interesting, but likely wants orthogonal or weight-transform semantics. Keep until hook-only formulation is clear.	Pseudocode first, then one invariant that distinguishes real rotation from dead code.
S-steer / AntiPaSTO	Research adapters. Should use `group_init` and activation evidence, not be squeezed into plain LoRA tests.	Calibration is consumed, hooks removed, load does not need calibration data.

Review history

A cold subagent review first returned PASS_WITH_BLOCKERS:

bnb modules were not targeted.
Hook cast y to cfg.dtype, which could round base outputs.
PiSSA overclaimed bnb support.
load() did not fail on missing adapter keys.
Data-calibrated init needed model-level access.

Fixes applied:

Structural is_linear_like() target predicate.
Hook only casts x, keeps y in base output dtype.
PiSSA fail-fast rejects non-plain nn.Linear.
load() fails on missing or unexpected lora_ keys.
attach(..., calibration_data=None) plus optional group_init(model, targets, cfg, calibration_data).

Second cold review verdict: PASS for the minimal 4bit-enabled scope.

TODO / status

Next implementation goals

Add DoRA.
- Verify: fp32/bf16 identity at init, finite gradients, and smoke loss drop.
- Caveat: bnb DoRA needs explicit weight dequantization for norm computation or should be fp-only at first.
Add VeRA.
- Verify: shared buffers are allocated once, target slices match shape, identity or near-identity at init.
Add SSVD or AntiPaSTO-style SVD variant.
- Verify: reconstruction or intended rotation invariant at init.
Add real activation-calibrated toy variant using group_init.
- Verify: calibration_data is consumed during attach, hooks are removed, checkpoint is serializable, and load() does not require calibration data.
Add load path that can skip calibration init for future group_init variants.
- Current caveat: load() calls attach(model, cfg) with calibration_data=None; fine for current variants, but future calibrated variants should separate param creation from calibration.
Add a tiny HF-model smoke when convenient.
- Verify: target names look like real transformer modules and state dict keys match full paths.

Design TODOs

Decide whether group_init should run before or after forward hooks are registered.
- Current choice: after params are attached, before adapter forward hooks are registered.
Decide whether replacing variants need runs_base_layer=False or can always transform y.
- OFT-like variants can rotate y; variants that truly avoid base forward need module replacement or pre-hook rewriting, likely out of v1.
Add weight_mode for BitFit/SHiRA if those variants become in-scope.
- Minimal surface: weight_mode in {"frozen", "bias_only", "sparse_grad"}.

Variant contract

class Variant:
    name: str

    @staticmethod
    def param_specs(d_in, d_out, cfg) -> dict[str, ParamSpec]: ...

    @staticmethod
    def init(layer, cfg) -> None:
        # weight-only init; may mutate plain fp weights
        ...

    @staticmethod
    def group_init(model, targets, cfg, calibration_data) -> None:
        # optional model-level init for data-calibrated or cross-layer variants
        ...

    @staticmethod
    def forward(layer, x, y) -> Tensor:
        # return NEW output; additive variants return y + delta
        ...

Done means

This repo is good enough for a first real experiment when:

A Qwen/Llama model can attach LoRA adapters to intended target layers.
A 4bit or 8bit loaded model can train LoRA/DeLoRA params with nonzero gradients.
Saved adapter tensors use full-path keys and reload without calibration data.
Smoke tests distinguish target-skipping, hook identity drift, and missing-key load failure.

see interesting adapters here https://github.com/wassname/adapters_as_hypotheses how peft handle 4bit here https://github.com/huggingface/peft/blob/6030f9160ed2fc17220f6f41382a66f1257b6a93/src/peft/tuners/lora/layer.py

12 KiB Raw Blame History

lora-lite plan and status

Goal

Non-goals

Design constraints

Implemented v0.0.1

Current smoke evidence

2026-04-26 testing proof pass

Scope

Requirements and evidence

Qwen proof table

2026-04-26 publishable workflow pass

Scope

Requirements and evidence

Adapter roadmap, ranked by simplicity

Review history

TODO / status

Next implementation goals

Design TODOs

Variant contract

Done means

12 KiB

Raw Blame History