antipasto_ablate: warm-start lora_c from S-space output variance

group_init now seeds each lora_c to the top-k principal axes of the S-space output coords h=diag(S)Vh x (highest-energy output dirs => largest loss-grad on the ablation strength), so lora_c starts in a high-gradient region not random. Cheap r x r second moment when not orienting; reuses Sigma xx^T when cov_orient. Benchmark always calibrates ablate now. This is the data-variance direction, not a contrastive behavior dir (SFT has no pos/neg split) -- noted in the docstring. UAT: |cos(lora_c, top output-PC)| = 1.0000 vs ~0.35 chance; smoke green. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 16:15:50 +08:00 · 2026-06-17 18:18:32 +08:00
parent 6cb350a4b6
commit fe562c2b5c
2 changed files with 59 additions and 36 deletions
@@ -604,9 +604,9 @@ def run(args: BenchmarkConfig) -> dict[str, Any]:
    # downstream task (IPM mode, per CorDA). eva needs only a few batches for its init;
    # corda/asvd/cov-orient estimate an input second moment, so we hand them many more
    # batches (PEFT calibrates on a few hundred sequences) for a well-conditioned basis.
-    needs_calib = args.variant in ("eva", "antipasto_corda", "antipasto_asvd") or (
-        args.variant == "antipasto_ablate" and args.antipasto_cov_orient
-    )
+    # antipasto_ablate always calibrates now: group_init warm-starts lora_c from the
+    # S-space output variance (cov_orient adds the heavier CorDA re-orient on top).
+    needs_calib = args.variant in ("eva", "antipasto_corda", "antipasto_asvd", "antipasto_ablate")
    init_meter = group_init_meter()            # wall-time + peak CPU RAM of group_init
    if needs_calib:
        n_batches = min(4, len(batches)) if args.variant == "eva" else min(64, len(batches))