mirror of
https://github.com/wassname/lora-lite.git
synced 2026-06-27 16:45:56 +08:00
README: rot basis is within noise (seed order flips), soften V claim
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -72,7 +72,7 @@ We validate our adapters the same way [PEFT](https://github.com/huggingface/peft
|
||||
|
||||
AntiPaSTO is the novel row here: instead of adding trainable directions like LoRA, it freezes W's own top-r SVD and learns only a per-direction singular-value delta plus a block-diagonal Cayley rotation of that frozen basis. The singular directions stay interpretable and the adapter is tiny (15K params, ~230x smaller than LoRA's 3.54M) yet stays within noise of the full-rank adapters. The default rotates the input basis (V); rotating the output (U), both, or neither are `rotate_basis` ablation axes.
|
||||
|
||||
The full AntiPaSTO family (rotation-free gain core, the U/both rotation arms, contractive directional ablation [Arditi+ 2024](https://arxiv.org/abs/2406.11717), a low-rank mixing core, and CorDA/ASVD covariance-oriented bases [Yang+ 2024](https://arxiv.org/abs/2406.05223) / [Yuan+ 2023](https://arxiv.org/abs/2312.05821)) lives on the [`antipasto-variants`](https://github.com/wassname/lora-lite/tree/antipasto-variants) branch with its own ablation table. On GSM8K/down_proj none of those arms separated from this one at a single seed, and the covariance-oriented bases cost 34-120 s of init for no gain, so main keeps the cheapest arm that led: rotation of V.
|
||||
The full AntiPaSTO family (rotation-free gain core, the U/both rotation arms, contractive directional ablation [Arditi+ 2024](https://arxiv.org/abs/2406.11717), a low-rank mixing core, and CorDA/ASVD covariance-oriented bases [Yang+ 2024](https://arxiv.org/abs/2406.05223) / [Yuan+ 2023](https://arxiv.org/abs/2312.05821)) lives on the [`antipasto-variants`](https://github.com/wassname/lora-lite/tree/antipasto-variants) branch with its own ablation table. On GSM8K/down_proj none of those arms separated from this one (the covariance-oriented bases cost 34-120 s of init for no gain; the V/U/both rotation order flips between two seeds, so the basis is within noise), so main keeps one cheap arm: rotation of V.
|
||||
|
||||
|
||||
## Developer docs
|
||||
|
||||
Reference in New Issue
Block a user