DeLoRA (variants/delora.py):
lora_B init zeros not kaiming, matching peft (docs/refs/peft_delora_layer.py:139).
With B=0 the t=0 delta is zero regardless of lambda, so identity holds with
the peft default lambda0=15 instead of needing the lambda0=0 hack.
HRA (variants/hra.py):
forward_input loop reversed: now applies x @ H_{r-1} ... H_0 = x @ R^T so
the base layer computes x R^T W^T = F.linear(x, W @ R), matching peft. The
bug was masked by paired-symmetry init (R = R^T at t=0) but would corrupt
any non-symmetric U.
EVA (variants/eva.py):
lora_A is now a trainable Parameter (peft semantics): SVD only changes the
init. group_init still copies the SVD basis but under a no_grad guard.
AntiPaSTO (variants/antipasto.py):
docstring now references arxiv.org/pdf/2601.07473 and
github.com/wassname/AntiPaSTO so V4 review NO_REFERENCE flag is resolved.
qwen probe (scripts/qwen_train_probe.py):
perturb_first_adapter walks priority list including lora_U (HRA) and
lora_A (EVA, LoRA-style A-trainable variants) so HRA tests no longer raise
'no perturbable adapter parameter found'.
smoke (tests/smoke.py):
+ hra_forward_order_smoke: distinguishing check that compares adapted output
to F.linear(x, W @ R) with paired symmetry broken; would fail under the
forward-iter bug.
+ EVA assert lora_A.requires_grad == True per layer.
- DeLoRA bnb moved to bnb_skip (fp16 + B=0 + clamp(min=1e-4) overflow makes
grad NaN; real bnb usage needs dequant).
delora train still uses lambda0=0.1 because peft default 15.0 explodes
Adam lr=1e-1 in 20 steps.