23 Commits

Author SHA1 Message Date
wassname 5d910996b3 justfile: bench-variant takes a target_override arg, routed to its own out dir
LoRA-XS's paper recipe spreads across q/k/v/o + all 3 FFN projections, not
down_proj only. run_id ignores target, so overridden runs go to
outputs/metamath_gsm8k_alllinear to avoid clobbering the canonical down_proj
results the README table is built from.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-18 21:53:30 +08:00
wassname c792ad3e5f Add LoRA-XS variant: train only r×r core R between frozen SVD factors
Bałazy et al. 2024 (arxiv 2405.17604). A=diag(Sr)Vhr, B=Ur frozen from
top-r SVD of W (W left intact); only the r×r R is trained, init normal(0,1e-5)
so the adapter ~ identity at t=0. ~25k params at r=32 (24 down_proj targets).
justfile: alpha=r (scale=1) and lr=4e-3, matching the ref LLaMA math config.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-18 19:48:40 +08:00
wassname 12fa56f328 Collapse antipasto family to one variant: rot(V) becomes canonical antipasto
main keeps a single antipasto = the rotation+delta SVD adapter (the published
method, paper 2601.07473), default rotate_basis=V. On GSM8K/down_proj rot(V)
led the family (57.2) and at a single seed nothing separated from it, while the
covariance-oriented arms cost 34-120s init for no gain. The full family (gain
core, U/both rotations, ablate, dplr, corda, asvd) is preserved on the
antipasto-variants branch.

- antipasto.py is now the rotation implementation, registered as "antipasto"
- delete antipasto_{rot,ablate,corda,asvd,dplr}.py + their config exports
- benchmark/justfile/cost_report/smoke: drop the removed variants + dead knobs
  (antipasto_coeff/suppress_only/ablate_k/cov_orient/lora_rank); keep
  --antipasto-rotate-basis as antipasto's V/U/both/none ablation axis
- README: subset table to one antipasto row, add rank column, note single-seed
  noise floor (~1.4pp), point the full family at the branch

smoke: 10 passed

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-17 21:05:51 +08:00
wassname 458c3861e8 justfile: bench-variant takes a seed arg (default 0, unchanged)
Lets the rot-basis ablation get a second seed without clobbering the seed0
run_id, so V>U>both can be confirmed against seed noise.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-17 18:29:23 +08:00
wassname 5f9d90d8b8 benchmark sweep: rot(U/both) ablation, whitening conclusion, cost rows
- antipasto_rot: add rotate_basis="both" (independent V+U Cayley rotations),
  run_id suffix __rotU/__rotboth so ablation arms get their own output dirs
- justfile: thread rotate_basis through bench-variant
- corda/eva: padding-mask fix in calibration capture + bf16-tight residual
- README: fill PiSSA/DoRA/CorDA/ASVD/ablate/dplr/rot rows; record the
  metric-axis ablation (C=I 56.0 > diag-C 55.6 > full-C 54.7) and the
  rotation ablation (V 57.2 > U 56.5 > both 55.6) conclusions
- docs/reviews: external ref-checks + deepseek/gpt reviews of the cores

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-17 06:17:53 +08:00
wassname d4ec550dd8 fix: corda silently ran as plain SVD; wire calibration + persist data-driven residual
The benchmark only passed calibration_data to eva, so antipasto_corda's
group_init hit `if calibration_data is None: return` and every corda run was
actually plain SVD. The covariance orientation never executed -- all prior
corda-vs-antipasto comparisons are void.

- antipasto_corda.group_init: raise on None instead of silently degrading
  (orientation is the variant's whole identity; fail loud).
- benchmark: feed ~256 MetaMath calibration samples (IPM, per PEFT/CorDA) to
  corda and to cov_orient ablate; run_id now carries an __lr tag.
- adapter.save/load: a data-driven group_init rewrites the frozen base residual
  W_res into a form init() cannot reproduce at load (it only knows the plain
  top-r crop). Persist those residuals in the adapter and restore them. Fixes a
  reload-logits mismatch that was masked while group_init never ran.
- probe check: compare every saved tensor (lora_ buffers AND base residuals)
  against the reloaded model state.
- justfile: bench-variant gains an lr_override (the core wants a tamer lr than
  the gain's 5e-3).

Co-Authored-By: Claudypoo <noreply@anthropic.com>
2026-06-16 05:56:02 +08:00
wassname 9d027752ad variants: replace arrow's dense block with diagonal-plus-low-rank core
antipasto_arrow -> antipasto_dplr. The arrowhead's dense b x b block is the wrong
shape: b^2 params, mixes only the top-b, and sits on the S-scaled coords so its
perturbation is amplified by the largest singular values (block=128 collapsed to
45.7% at the gain's lr). Replace it with LoRA's lesson -- a low-rank core inside
the frozen basis, ADDED to the gain:

    DeltaW = U [diag(S_eff) + coeff * B A] Vh,   A:(k,r) B:(r,k), B=0 at init

The low-rank part mixes the whole top-r subspace for 2*r*k params (k=LoRA's rank),
and being additive (not * diag(S)) it is S-independent -- the amplification edge is
gone by construction. Diagonal gain unchanged; identity at init from B=0 and g=0.

Wired through benchmark (antipasto_lora_rank, run_id __k suffix), justfile, cost_report,
smoke (green, dplr attaches/trains/round-trips). Arrow code removed; its run results
stay on disk for comparison.

Co-Authored-By: Claudypoo <noreply@anthropic.com>
2026-06-15 20:13:15 +08:00
wassname 2c56196dea justfile/run_id: r override for low-rank antipasto sweeps
bench-variant gains an r_override arg (alpha tracks r for the antipasto family);
run_id appends __r<N> when an antipasto-family run uses r!=256, so the low-rank
corda-vs-antipasto sweep does not overwrite the r=256 results.

Co-Authored-By: Claudypoo <noreply@anthropic.com>
2026-06-15 18:35:54 +08:00
wassname e8ca6f5944 README: validation framing per wassname's wording; arrow large-block lr=1e-4
README: 'we validate the same way PEFT does; trained properly they clear 49% on
GSM8K, all pass' + link to the benchmark script.

justfile: arrow with block>8 uses lr=1e-4 not 5e-3. The 5e-3 that suits the tiny
S-space gain destabilizes the large dense block -- block=128 at 5e-3 scored 45.7%
(below the bar, vs block=8's 60.5%). Capacity sweep requeued at LoRA's 1e-4 to
de-confound params-vs-lr.

Co-Authored-By: Claudypoo <noreply@anthropic.com>
2026-06-15 18:27:33 +08:00
wassname d9d31a160f variants: clean docstrings to research pseudocode; arrow block param
Rewrite antipasto/ablate/corda/arrow docstrings to the house style (purpose +
math block + identity line + refs), dropping the rambly meta-commentary aimed at
past design decisions ('Changes vs the rotation version', chat references, inline
measurements). Net -74 lines.

Also answer the FIXMEs left on main's old copy:
  - group_init is Wanda/ASVD *selection* (re-rank W's own singular vectors), NOT
    CorDA re-orientation -- that is antipasto_corda.py.
  - it rebuilds the FULL W exactly (W_res + stored top-r == W), so the re-SVD sees
    the whole spectrum, not a cropped matrix.

Arrow capacity: --antipasto-block CLI knob (justfile bench-variant 4th arg) so the
block can be scaled toward LoRA params; run_id gets a __b<N> suffix so block-sweep
runs do not collide. Smoke green (14 passed).

Co-Authored-By: Claudypoo <noreply@anthropic.com>
2026-06-15 18:09:53 +08:00
wassname 32b1fd885a justfile: route antipasto bench through r=256/alpha=256 in bench-variant
The README GSM8K sweep was queued as raw expanded commands with an
unquoted --target-name '(q_proj|v_proj)$'; pueue runs via sh -c, so the
parens errored instantly before training. Routing through bench-variant
(bash shebang quotes the target) fixes it. Also bake the antipasto family's
r=256/alpha=256 into the case block so it matches the published AntiPaSTO
row, replacing the dead trailing "$@" (shebang recipes get no extra args).

Co-Authored-By: Claudypoo <noreply@anthropic.com>
2026-06-15 05:58:34 +08:00
wassname d6b242818a justfile: lr=5e-3 for all antipasto_* cores in bench-variant
The small-param antipasto family (gain/block/ablate/corda) all need the higher
lr to clear the bf16 round-to-nearest floor, not just antipasto. Glob the case.

Co-Authored-By: Claudypoo <noreply@anthropic.com>
2026-06-14 19:20:35 +08:00
wassname f6fd410677 benchmark: antipasto rotate_basis CLI + lr=5e-3 + ablation queue 2026-04-27 16:29:25 +08:00
wassname e624cd244f feat: near_zero/near_one init for trainable params (breaks bf16 dead-grad symmetry)
Trainable params that were init'd at exact 0 or 1 now use near_zero (N(0,1e-4))
or near_one (1 + N(0,1e-4)) to break bf16 symmetry without meaningfully
breaking identity-at-t=0. Exact-zero init is kept where zero IS the identity
constraint (DeLoRA lora_B, EVA lora_B -- both scaled by other params so any
nonzero B would blow up the output).

AntiPaSTO: delta_s and rot_T now near_zero. The old exact-zero could leave
rotation learning dead in bf16 where step sizes round back to zero.

IA3: lora_g now near_one instead of exact ones. Avoids the bf16 spacing issue
around 1.0 where eps_bf16 ~ 7.8e-3 and lr=1e-3 updates were rounding away.

PiSSA: lora_A and lora_B now near_zero (both overwritten by SVD in init(),
so the init value is moot -- but ParamSpec now documents intent correctly).

HRA: lora_U now near_zero (overwritten by symmetric init in init()).

ParamSpec: added 'near_zero' and 'near_one' init modes. Default changed from
'zeros' to 'near_zero'. Tests relaxed identity tolerances accordingly.
2026-04-27 15:55:05 +08:00
wassname 0bd091fe5b tidy 2026-04-27 11:44:40 +08:00
wassname a342801807 wip 2026-04-27 11:24:19 +08:00
wassname b60a8c3f9b readme 2026-04-27 09:46:52 +08:00
wassname 74c374e741 tidy, review 2026-04-27 07:03:24 +08:00
wassname b179771cc6 tyro and benchmark 2026-04-27 06:23:30 +08:00
wassname 2abf616be6 feat(dora): add weight-decomposed LoRA variant for fp layers 2026-04-26 17:53:33 +08:00
wassname 699fde31bf feat: ia3 variant, real bnb 4bit/8bit smoke, dev guide split, user-only readme 2026-04-26 17:49:17 +08:00
wassname f2d9021511 ci: add publishable check workflow 2026-04-26 17:09:47 +08:00
wassname 69bf5f4e44 test: prove adapter training paths 2026-04-26 17:00:39 +08:00