lora-lite

wassname/lora-lite

Fork 0

mirror of https://github.com/wassname/lora-lite.git synced 2026-06-27 16:45:56 +08:00

Commit Graph

Author	SHA1	Message	Date
wassname	28d04f1e1d	gitignore: match loraxs_ review scratch; track curated loraxs_review.md Broaden raw/err patterns to raw/err so prefixed scratch (loraxs_raw.jsonl, loraxs_err.txt) is ignored. Add the GPT-5.5 review of the lora_xs variant as the curated artifact. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-19 06:04:25 +08:00
wassname	5f9d90d8b8	benchmark sweep: rot(U/both) ablation, whitening conclusion, cost rows - antipasto_rot: add rotate_basis="both" (independent V+U Cayley rotations), run_id suffix __rotU/__rotboth so ablation arms get their own output dirs - justfile: thread rotate_basis through bench-variant - corda/eva: padding-mask fix in calibration capture + bf16-tight residual - README: fill PiSSA/DoRA/CorDA/ASVD/ablate/dplr/rot rows; record the metric-axis ablation (C=I 56.0 > diag-C 55.6 > full-C 54.7) and the rotation ablation (V 57.2 > U 56.5 > both 55.6) conclusions - docs/reviews: external ref-checks + deepseek/gpt reviews of the cores Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-17 06:17:53 +08:00
wassname	b80d7778af	Add rotation-free S-space adapter cores (antipasto family) Replace antipasto's rotation/Cayley with a bounded 1+ELU gain and split the S-space idea into four interpretable PiSSA-style cores (frozen U/S/Vh, small trainable core): - antipasto: S_eff = S(1+ELU(coeffg)). exp-bounded attenuation, linear amplification (constant gradient, no runaway). g=0 -> exact identity. - antipasto_rot: keeps the block-Cayley rotation as a separate variant for cost comparison (its per-forward solve is the 72ms vs 36ms gap). - antipasto_ablate: contractive (I - a c c^T) diag(S), eigenvalues in [0,1], cannot blow up. Optional cov_orient (CorDA) basis. - antipasto_corda: covariance-oriented oblique projector P = Vh C^{-1/2}, the data-energy basis rather than the weight-gain basis. 1+ELU gain. Add scripts/_cost.py + scripts/cost_report.py: one-row-per-variant cost table (trainable params, peak GPU mem, fwd/bwd ms, added MACs/tok, group_init ms). Wire all four into the benchmark, smoke test, and __init__ exports. External review (DeepSeek-v4-pro, docs/reviews/) verified the math; acted on its one real point (corda g now inits to zeros for exact identity). Co-Authored-By: Claudypoo <noreply@anthropic.com>	2026-06-14 19:12:27 +08:00

Author

SHA1

Message

Date

wassname

28d04f1e1d

gitignore: match loraxs_ review scratch; track curated loraxs_review.md

Broaden raw*/err* patterns to *raw*/*err* so prefixed scratch
(loraxs_raw.jsonl, loraxs_err.txt) is ignored. Add the GPT-5.5 review of
the lora_xs variant as the curated artifact.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-19 06:04:25 +08:00

wassname

5f9d90d8b8

benchmark sweep: rot(U/both) ablation, whitening conclusion, cost rows

- antipasto_rot: add rotate_basis="both" (independent V+U Cayley rotations),
  run_id suffix __rotU/__rotboth so ablation arms get their own output dirs
- justfile: thread rotate_basis through bench-variant
- corda/eva: padding-mask fix in calibration capture + bf16-tight residual
- README: fill PiSSA/DoRA/CorDA/ASVD/ablate/dplr/rot rows; record the
  metric-axis ablation (C=I 56.0 > diag-C 55.6 > full-C 54.7) and the
  rotation ablation (V 57.2 > U 56.5 > both 55.6) conclusions
- docs/reviews: external ref-checks + deepseek/gpt reviews of the cores

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-17 06:17:53 +08:00

wassname

b80d7778af

Add rotation-free S-space adapter cores (antipasto family)

Replace antipasto's rotation/Cayley with a bounded 1+ELU gain and split the
S-space idea into four interpretable PiSSA-style cores (frozen U/S/Vh, small
trainable core):

- antipasto: S_eff = S*(1+ELU(coeff*g)). exp-bounded attenuation, linear
  amplification (constant gradient, no runaway). g=0 -> exact identity.
- antipasto_rot: keeps the block-Cayley rotation as a separate variant for
  cost comparison (its per-forward solve is the 72ms vs 36ms gap).
- antipasto_ablate: contractive (I - a c c^T) diag(S), eigenvalues in [0,1],
  cannot blow up. Optional cov_orient (CorDA) basis.
- antipasto_corda: covariance-oriented oblique projector P = Vh C^{-1/2}, the
  data-energy basis rather than the weight-gain basis. 1+ELU gain.

Add scripts/_cost.py + scripts/cost_report.py: one-row-per-variant cost table
(trainable params, peak GPU mem, fwd/bwd ms, added MACs/tok, group_init ms).
Wire all four into the benchmark, smoke test, and __init__ exports.

External review (DeepSeek-v4-pro, docs/reviews/) verified the math; acted on
its one real point (corda g now inits to zeros for exact identity).

Co-Authored-By: Claudypoo <noreply@anthropic.com>

2026-06-14 19:12:27 +08:00

3 Commits