mirror of
https://github.com/wassname/lora-lite.git
synced 2026-06-27 17:16:12 +08:00
Revert ablate lora_c warm-start: variance-PC seed didn't help on SFT
Job 94 result (Qwen3.5-0.8B, GSM8K, 2500 steps, single seed): warm-start (top-k S-space output-variance PC): test 55.6 / valid 64.0, init 33.2s random-init (prior default): test 56.0 / valid 68.0, init 2.2s Equal-or-worse accuracy (within single-seed noise) for +31s of calibration init. The optimal ablation direction is loss-defined, not variance-defined, so seeding lora_c from the data-variance PC buys nothing here. Reverts fe562c2; ablate is back to the cheap random-init default. cov_orient (CorDA re-orient) path kept. The FIXME's actual proposal -- a *contrastive* dS seed -- stays open but needs pos/neg pairs this SFT benchmark lacks (only relevant for labelled steering). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -604,9 +604,9 @@ def run(args: BenchmarkConfig) -> dict[str, Any]:
|
||||
# downstream task (IPM mode, per CorDA). eva needs only a few batches for its init;
|
||||
# corda/asvd/cov-orient estimate an input second moment, so we hand them many more
|
||||
# batches (PEFT calibrates on a few hundred sequences) for a well-conditioned basis.
|
||||
# antipasto_ablate always calibrates now: group_init warm-starts lora_c from the
|
||||
# S-space output variance (cov_orient adds the heavier CorDA re-orient on top).
|
||||
needs_calib = args.variant in ("eva", "antipasto_corda", "antipasto_asvd", "antipasto_ablate")
|
||||
needs_calib = args.variant in ("eva", "antipasto_corda", "antipasto_asvd") or (
|
||||
args.variant == "antipasto_ablate" and args.antipasto_cov_orient
|
||||
)
|
||||
init_meter = group_init_meter() # wall-time + peak CPU RAM of group_init
|
||||
if needs_calib:
|
||||
n_batches = min(4, len(batches)) if args.variant == "eva" else min(64, len(batches))
|
||||
|
||||
Reference in New Issue
Block a user