Commit Graph

3 Commits

Author SHA1 Message Date
wassname 9d027752ad variants: replace arrow's dense block with diagonal-plus-low-rank core
antipasto_arrow -> antipasto_dplr. The arrowhead's dense b x b block is the wrong
shape: b^2 params, mixes only the top-b, and sits on the S-scaled coords so its
perturbation is amplified by the largest singular values (block=128 collapsed to
45.7% at the gain's lr). Replace it with LoRA's lesson -- a low-rank core inside
the frozen basis, ADDED to the gain:

    DeltaW = U [diag(S_eff) + coeff * B A] Vh,   A:(k,r) B:(r,k), B=0 at init

The low-rank part mixes the whole top-r subspace for 2*r*k params (k=LoRA's rank),
and being additive (not * diag(S)) it is S-independent -- the amplification edge is
gone by construction. Diagonal gain unchanged; identity at init from B=0 and g=0.

Wired through benchmark (antipasto_lora_rank, run_id __k suffix), justfile, cost_report,
smoke (green, dplr attaches/trains/round-trips). Arrow code removed; its run results
stay on disk for comparison.

Co-Authored-By: Claudypoo <noreply@anthropic.com>
2026-06-15 20:13:15 +08:00
wassname 0d40cc9b38 Add antipasto_arrow: structured fixed-basis core (cross-direction mixing)
antipasto's diagonal core can only rescale each frozen singular direction; it
can never let direction i's input drive direction j's output, yet the steered
behaviour is an off-axis combination. A dense r x r core fixes that but costs
r^2 params. antipasto_arrow uses the arrowhead structure instead: a dense b x b
block on the top-b singular directions (full coupling where the action lives)
plus a diagonal 1+ELU tail on the rest. b^2 + (r-b) params, one b x b matmul
per forward -- cross-direction mixing at diagonal-core cost, no Cayley solve.

Identity at init (M=0 -> B=I, g=0 -> gain=1). Verified on a Linear: rel_err
1.5e-7 at init; M[i,j] routes input dir j -> output dir i with weight exactly
M[i,j] (diagonal core forces 0); 14 train params at r=8,b=3 vs r^2=64.

Wired into benchmark (antipasto_block knob), smoke (block=2 for r=4), cost
report, and exports.

Co-Authored-By: Claudypoo <noreply@anthropic.com>
2026-06-14 19:18:59 +08:00
wassname b80d7778af Add rotation-free S-space adapter cores (antipasto family)
Replace antipasto's rotation/Cayley with a bounded 1+ELU gain and split the
S-space idea into four interpretable PiSSA-style cores (frozen U/S/Vh, small
trainable core):

- antipasto: S_eff = S*(1+ELU(coeff*g)). exp-bounded attenuation, linear
  amplification (constant gradient, no runaway). g=0 -> exact identity.
- antipasto_rot: keeps the block-Cayley rotation as a separate variant for
  cost comparison (its per-forward solve is the 72ms vs 36ms gap).
- antipasto_ablate: contractive (I - a c c^T) diag(S), eigenvalues in [0,1],
  cannot blow up. Optional cov_orient (CorDA) basis.
- antipasto_corda: covariance-oriented oblique projector P = Vh C^{-1/2}, the
  data-energy basis rather than the weight-gain basis. 1+ELU gain.

Add scripts/_cost.py + scripts/cost_report.py: one-row-per-variant cost table
(trainable params, peak GPU mem, fwd/bwd ms, added MACs/tok, group_init ms).
Wire all four into the benchmark, smoke test, and __init__ exports.

External review (DeepSeek-v4-pro, docs/reviews/) verified the math; acted on
its one real point (corda g now inits to zeros for exact identity).

Co-Authored-By: Claudypoo <noreply@anthropic.com>
2026-06-14 19:12:27 +08:00