Commit Graph

16 Commits

Author SHA1 Message Date
wassname 56937e1b18 remove dead code: _road_matrix, callable(m) clause, silent git fallback
- delete _road_matrix in variants/road.py (zero callers)
- drop redundant callable(m) clause in is_linear_like (every nn.Module is callable)
- remove try/except in current_git_commit so missing git crashes loudly
  instead of writing "unknown" into the results TSV

Co-Authored-By: Claudypoo <noreply@anthropic.com>
2026-05-19 19:11:32 +08:00
wassname f6fd410677 benchmark: antipasto rotate_basis CLI + lr=5e-3 + ablation queue 2026-04-27 16:29:25 +08:00
wassname e624cd244f feat: near_zero/near_one init for trainable params (breaks bf16 dead-grad symmetry)
Trainable params that were init'd at exact 0 or 1 now use near_zero (N(0,1e-4))
or near_one (1 + N(0,1e-4)) to break bf16 symmetry without meaningfully
breaking identity-at-t=0. Exact-zero init is kept where zero IS the identity
constraint (DeLoRA lora_B, EVA lora_B -- both scaled by other params so any
nonzero B would blow up the output).

AntiPaSTO: delta_s and rot_T now near_zero. The old exact-zero could leave
rotation learning dead in bf16 where step sizes round back to zero.

IA3: lora_g now near_one instead of exact ones. Avoids the bf16 spacing issue
around 1.0 where eps_bf16 ~ 7.8e-3 and lr=1e-3 updates were rounding away.

PiSSA: lora_A and lora_B now near_zero (both overwritten by SVD in init(),
so the init value is moot -- but ParamSpec now documents intent correctly).

HRA: lora_U now near_zero (overwritten by symmetric init in init()).

ParamSpec: added 'near_zero' and 'near_one' init modes. Default changed from
'zeros' to 'near_zero'. Tests relaxed identity tolerances accordingly.
2026-04-27 15:55:05 +08:00
wassname 0bd091fe5b tidy 2026-04-27 11:44:40 +08:00
wassname a342801807 wip 2026-04-27 11:24:19 +08:00
wassname 24ba8deb02 simpler test 2026-04-27 09:47:07 +08:00
wassname 727ef6ea73 tidy tests to subset of metamath 2026-04-27 09:20:07 +08:00
wassname bb8887e66c tidy 2026-04-27 07:12:56 +08:00
wassname 74c374e741 tidy, review 2026-04-27 07:03:24 +08:00
wassname a81ed6ffaf misc 2026-04-27 06:23:36 +08:00
wassname b179771cc6 tyro and benchmark 2026-04-27 06:23:30 +08:00
wassname 67a6daf6aa fix: 5 V4 must-fix bugs (DeLoRA B-init, HRA forward order, EVA A trainable, AntiPaSTO refs, qwen probe)
DeLoRA (variants/delora.py):
  lora_B init zeros not kaiming, matching peft (docs/refs/peft_delora_layer.py:139).
  With B=0 the t=0 delta is zero regardless of lambda, so identity holds with
  the peft default lambda0=15 instead of needing the lambda0=0 hack.

HRA (variants/hra.py):
  forward_input loop reversed: now applies x @ H_{r-1} ... H_0 = x @ R^T so
  the base layer computes x R^T W^T = F.linear(x, W @ R), matching peft. The
  bug was masked by paired-symmetry init (R = R^T at t=0) but would corrupt
  any non-symmetric U.

EVA (variants/eva.py):
  lora_A is now a trainable Parameter (peft semantics): SVD only changes the
  init. group_init still copies the SVD basis but under a no_grad guard.

AntiPaSTO (variants/antipasto.py):
  docstring now references arxiv.org/pdf/2601.07473 and
  github.com/wassname/AntiPaSTO so V4 review NO_REFERENCE flag is resolved.

qwen probe (scripts/qwen_train_probe.py):
  perturb_first_adapter walks priority list including lora_U (HRA) and
  lora_A (EVA, LoRA-style A-trainable variants) so HRA tests no longer raise
  'no perturbable adapter parameter found'.

smoke (tests/smoke.py):
  + hra_forward_order_smoke: distinguishing check that compares adapted output
    to F.linear(x, W @ R) with paired symmetry broken; would fail under the
    forward-iter bug.
  + EVA assert lora_A.requires_grad == True per layer.
  - DeLoRA bnb moved to bnb_skip (fp16 + B=0 + clamp(min=1e-4) overflow makes
    grad NaN; real bnb usage needs dequant).
  delora train still uses lambda0=0.1 because peft default 15.0 explodes
  Adam lr=1e-1 in 20 steps.
2026-04-26 20:57:24 +08:00
wassname 0d929f93b3 feat(hra): add Householder Reflection Adaptation, hook-only/bnb-friendly + Qwen proof 2026-04-26 17:58:56 +08:00
wassname 2abf616be6 feat(dora): add weight-decomposed LoRA variant for fp layers 2026-04-26 17:53:33 +08:00
wassname 699fde31bf feat: ia3 variant, real bnb 4bit/8bit smoke, dev guide split, user-only readme 2026-04-26 17:49:17 +08:00
wassname 69bf5f4e44 test: prove adapter training paths 2026-04-26 17:00:39 +08:00