mirror of
https://github.com/wassname/lora-lite.git
synced 2026-06-27 18:05:16 +08:00
0d40cc9b38
antipasto's diagonal core can only rescale each frozen singular direction; it can never let direction i's input drive direction j's output, yet the steered behaviour is an off-axis combination. A dense r x r core fixes that but costs r^2 params. antipasto_arrow uses the arrowhead structure instead: a dense b x b block on the top-b singular directions (full coupling where the action lives) plus a diagonal 1+ELU tail on the rest. b^2 + (r-b) params, one b x b matmul per forward -- cross-direction mixing at diagonal-core cost, no Cayley solve. Identity at init (M=0 -> B=I, g=0 -> gain=1). Verified on a Linear: rel_err 1.5e-7 at init; M[i,j] routes input dir j -> output dir i with weight exactly M[i,j] (diagonal core forces 0); 14 train params at r=8,b=3 vs r^2=64. Wired into benchmark (antipasto_block knob), smoke (block=2 for r=4), cost report, and exports. Co-Authored-By: Claudypoo <noreply@anthropic.com>