lora-lite

mirror of https://github.com/wassname/lora-lite.git synced 2026-06-27 18:05:16 +08:00

Files

T

wassname 0d40cc9b38 Add antipasto_arrow: structured fixed-basis core (cross-direction mixing)

antipasto's diagonal core can only rescale each frozen singular direction; it
can never let direction i's input drive direction j's output, yet the steered
behaviour is an off-axis combination. A dense r x r core fixes that but costs
r^2 params. antipasto_arrow uses the arrowhead structure instead: a dense b x b
block on the top-b singular directions (full coupling where the action lives)
plus a diagonal 1+ELU tail on the rest. b^2 + (r-b) params, one b x b matmul
per forward -- cross-direction mixing at diagonal-core cost, no Cayley solve.

Identity at init (M=0 -> B=I, g=0 -> gain=1). Verified on a Linear: rel_err
1.5e-7 at init; M[i,j] routes input dir j -> output dir i with weight exactly
M[i,j] (diagonal core forces 0); 14 train params at r=8,b=3 vs r^2=64.

Wired into benchmark (antipasto_block knob), smoke (block=2 for r=4), cost
report, and exports.

Co-Authored-By: Claudypoo <noreply@anthropic.com>

2026-06-14 19:18:59 +08:00

test_metamath_gsm8k_benchmark.py

tyro and benchmark

2026-04-27 06:23:30 +08:00

test_metamath_smoke.py

Add antipasto_arrow: structured fixed-basis core (cross-direction mixing)

2026-06-14 19:18:59 +08:00