variants: replace arrow's dense block with diagonal-plus-low-rank core

antipasto_arrow -> antipasto_dplr. The arrowhead's dense b x b block is the wrong shape: b^2 params, mixes only the top-b, and sits on the S-scaled coords so its perturbation is amplified by the largest singular values (block=128 collapsed to 45.7% at the gain's lr). Replace it with LoRA's lesson -- a low-rank core inside the frozen basis, ADDED to the gain: DeltaW = U [diag(S_eff) + coeff * B A] Vh, A:(k,r) B:(r,k), B=0 at init The low-rank part mixes the whole top-r subspace for 2*r*k params (k=LoRA's rank), and being additive (not * diag(S)) it is S-independent -- the amplification edge is gone by construction. Diagonal gain unchanged; identity at init from B=0 and g=0. Wired through benchmark (antipasto_lora_rank, run_id __k suffix), justfile, cost_report, smoke (green, dplr attaches/trains/round-trips). Arrow code removed; its run results stay on disk for comparison. Co-Authored-By: Claudypoo <noreply@anthropic.com>
2026-06-27 18:24:15 +08:00 · 2026-06-15 20:13:15 +08:00
parent 2c56196dea
commit 9d027752ad
7 changed files with 79 additions and 75 deletions
@@ -50,7 +50,7 @@ def main() -> None:
    ap.add_argument("--model", default="Qwen/Qwen3-0.6B-Base")
    ap.add_argument("--variants", nargs="+",
                    default=["lora", "antipasto", "antipasto_rot", "antipasto_corda",
-                             "antipasto_ablate", "antipasto_arrow"])
+                             "antipasto_ablate", "antipasto_dplr"])
    ap.add_argument("--target-name", nargs="+", default=[r"q_proj$", r"v_proj$"])
    ap.add_argument("--r", type=int, default=32)
    ap.add_argument("--layers", default="all",