Seed-1 confirmation (jobs 106-108) flips the seed-0 ranking: seed0 V57.2>U56.5>
both55.6, seed1 U57.5>both56.9>V56.2. 2-seed test means (U57.0,V56.7,both56.3)
span 0.7pp, inside the ~1pp SE of a 2-seed mean, so the rotation basis is within
noise. rot(V) stays the default as a cheap representative, not a measured winner.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- antipasto_rot: add rotate_basis="both" (independent V+U Cayley rotations),
run_id suffix __rotU/__rotboth so ablation arms get their own output dirs
- justfile: thread rotate_basis through bench-variant
- corda/eva: padding-mask fix in calibration capture + bf16-tight residual
- README: fill PiSSA/DoRA/CorDA/ASVD/ablate/dplr/rot rows; record the
metric-axis ablation (C=I 56.0 > diag-C 55.6 > full-C 54.7) and the
rotation ablation (V 57.2 > U 56.5 > both 55.6) conclusions
- docs/reviews: external ref-checks + deepseek/gpt reviews of the cores
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
README: 'we validate the same way PEFT does; trained properly they clear 49% on
GSM8K, all pass' + link to the benchmark script.
justfile: arrow with block>8 uses lr=1e-4 not 5e-3. The 5e-3 that suits the tiny
S-space gain destabilizes the large dense block -- block=128 at 5e-3 scored 45.7%
(below the bar, vs block=8's 60.5%). Capacity sweep requeued at LoRA's 1e-4 to
de-confound params-vs-lr.
Co-Authored-By: Claudypoo <noreply@anthropic.com>
The point is that every adapter clears PEFT's ~48% LoRA bar on the same
MetaMathQA->GSM8K protocol -- that all rows pass is the it-trains signal,
not a competitive ranking.
Co-Authored-By: Claudypoo <noreply@anthropic.com>
- Fold the family into the main Variants table as rows (CorDA/ablate/arrow)
instead of a separate table.
- Lead with the point (freeze W's SVD, learn only a bounded gain -> interpretable,
O(r) params) before any numbers.
- Replace the unsourced 'PEFT reports 49.0%' line (wrong; LoRA is ~48%) with a
real link to PEFT's method_comparison/MetaMathQA and a pointer to the benchmark
script for hyperparameters. Link CorDA/Arditi papers inline.
Co-Authored-By: Claudypoo <noreply@anthropic.com>
Replace the per-experiment family breakdown table + comparison prose with a
2-sentence method description (frozen interpretable SVD basis, O(r) gain, the
three variant cores). Experiment findings (rotation comparison, arrow capacity,
cost/timing) belong in the research journal, not the README skim path.
Co-Authored-By: Claudypoo <noreply@anthropic.com>
Replace the stale single AntiPaSTO row (was 35.8K params from the removed
rotation version, described block-Cayley which no longer exists) with the
real 5000-step Qwen3-0.6B numbers and a family breakdown:
corda 61.9% 14.3K (best: covariance-oriented basis)
plain 61.4% 14.3K
rot 61.4% 35.8K (the rotation this replaces)
ablate 61.0% 14.4K
arrow 60.5% 17.5K
Headline: ~320x fewer trainable params than LoRA at ~97% of its accuracy.
Rotation buys nothing (rot matches plain to 3 s.f. at 2.5x params, +20%
wall-time, plus a per-forward Cayley solve), confirming the drop.
Co-Authored-By: Claudypoo <noreply@anthropic.com>
Trainable params that were init'd at exact 0 or 1 now use near_zero (N(0,1e-4))
or near_one (1 + N(0,1e-4)) to break bf16 symmetry without meaningfully
breaking identity-at-t=0. Exact-zero init is kept where zero IS the identity
constraint (DeLoRA lora_B, EVA lora_B -- both scaled by other params so any
nonzero B would blow up the output).
AntiPaSTO: delta_s and rot_T now near_zero. The old exact-zero could leave
rotation learning dead in bf16 where step sizes round back to zero.
IA3: lora_g now near_one instead of exact ones. Avoids the bf16 spacing issue
around 1.0 where eps_bf16 ~ 7.8e-3 and lr=1e-3 updates were rounding away.
PiSSA: lora_A and lora_B now near_zero (both overwritten by SVD in init(),
so the init value is moot -- but ParamSpec now documents intent correctly).
HRA: lora_U now near_zero (overwritten by symmetric init in init()).
ParamSpec: added 'near_zero' and 'near_one' init modes. Default changed from
'zeros' to 'near_zero'. Tests relaxed identity tolerances accordingly.