config: make prog_wide (30 pairs) the default vhack_pairs_path

prog_wide is the proven main pair set, so default to it instead of falling back
to the 18 hand-crafted vgrout.pairs.PAIRS (now only reached if explicitly None).
The same pairs build both v_grad and the route band in one extract pass -- no
separate threshold set. Spec updated to say so. route2 smoke green on the new
default (band +0.259). erase unaffected (explicit --v-hack-path takes precedence).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-06 05:02:08 +00:00
parent dd922d8793
commit f22b69d1d3
2 changed files with 11 additions and 5 deletions
@@ -71,6 +71,12 @@ reduces algebraically to `-∇logp(hack) + ∇logp(clean)` on `delta_S`. Source:
`src/vgrout/extract_vhack_grad.py`. Refreshed every `N` steps through the current adapter
(the basis goes stale: cin decays ~0.27->0.07 by step 10).
The SAME pairs build the direction AND the band -- one `extract_v_hack(pairs)` pass yields the
per-pair grads `raw_grads`, and both `v1`/`V_sub` and `(lower, upper)` come from it (no second
set for thresholds). The default/main pair set is `out/pairsets/prog_wide.json` (30 pool-derived
pairs, `--vhack-pairs-path` default in `Config`); the 18 hand-crafted `vgrout.pairs.PAIRS` are
only the fallback if that is set to None.
```py
def extract(model, wrappers, pairs, k, n_val):
train, val = pairs[:-n_val], pairs[-n_val:] # hold out n_val pairs for a label-free check
+5 -5
View File
@@ -189,11 +189,11 @@ class Config:
# step, so a run can be RE-SCORED later (more prompts, different eval) without
# retraining. Tiny per ckpt; a 200-step run at every-10 is ~46MB. Off for big sweeps.
save_eval_ckpts: bool = True
# Optional: pool-derived pairs JSON (built by pairs_from_pool.py). When set,
# BOTH the cache-miss extract AND the online refresh use these pairs instead
# of the hand-crafted vgrout.pairs.PAIRS. Required for the cross-
# mechanism experiment so refresh keeps tracking half_A's hack subspace.
vhack_pairs_path: Path | None = None
# Pool-derived pairs JSON (built by pairs_from_pool.py) used to extract v_hack/v_grad
# AND calibrate the route band; both the cache-miss extract and the online refresh use
# it. DEFAULT prog_wide (30 pairs) -- the proven main set; richer than the 18 hand-crafted
# vgrout.pairs.PAIRS, which remain the fallback only if this is set to None explicitly.
vhack_pairs_path: Path | None = Path("out/pairsets/prog_wide.json")
# Directionality control: replace route2's pair-derived v_grad with a per-module
# Haar-random unit vector. Tests whether route2's suppression NEEDS the direction
# (H4: alignment) or is alignment-agnostic quarantine-absorption (H2). Seeded so