mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:15:35 +08:00
config: make prog_wide (30 pairs) the default vhack_pairs_path
prog_wide is the proven main pair set, so default to it instead of falling back to the 18 hand-crafted vgrout.pairs.PAIRS (now only reached if explicitly None). The same pairs build both v_grad and the route band in one extract pass -- no separate threshold set. Spec updated to say so. route2 smoke green on the new default (band +0.259). erase unaffected (explicit --v-hack-path takes precedence). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -71,6 +71,12 @@ reduces algebraically to `-∇logp(hack) + ∇logp(clean)` on `delta_S`. Source:
|
||||
`src/vgrout/extract_vhack_grad.py`. Refreshed every `N` steps through the current adapter
|
||||
(the basis goes stale: cin decays ~0.27->0.07 by step 10).
|
||||
|
||||
The SAME pairs build the direction AND the band -- one `extract_v_hack(pairs)` pass yields the
|
||||
per-pair grads `raw_grads`, and both `v1`/`V_sub` and `(lower, upper)` come from it (no second
|
||||
set for thresholds). The default/main pair set is `out/pairsets/prog_wide.json` (30 pool-derived
|
||||
pairs, `--vhack-pairs-path` default in `Config`); the 18 hand-crafted `vgrout.pairs.PAIRS` are
|
||||
only the fallback if that is set to None.
|
||||
|
||||
```py
|
||||
def extract(model, wrappers, pairs, k, n_val):
|
||||
train, val = pairs[:-n_val], pairs[-n_val:] # hold out n_val pairs for a label-free check
|
||||
|
||||
+5
-5
@@ -189,11 +189,11 @@ class Config:
|
||||
# step, so a run can be RE-SCORED later (more prompts, different eval) without
|
||||
# retraining. Tiny per ckpt; a 200-step run at every-10 is ~46MB. Off for big sweeps.
|
||||
save_eval_ckpts: bool = True
|
||||
# Optional: pool-derived pairs JSON (built by pairs_from_pool.py). When set,
|
||||
# BOTH the cache-miss extract AND the online refresh use these pairs instead
|
||||
# of the hand-crafted vgrout.pairs.PAIRS. Required for the cross-
|
||||
# mechanism experiment so refresh keeps tracking half_A's hack subspace.
|
||||
vhack_pairs_path: Path | None = None
|
||||
# Pool-derived pairs JSON (built by pairs_from_pool.py) used to extract v_hack/v_grad
|
||||
# AND calibrate the route band; both the cache-miss extract and the online refresh use
|
||||
# it. DEFAULT prog_wide (30 pairs) -- the proven main set; richer than the 18 hand-crafted
|
||||
# vgrout.pairs.PAIRS, which remain the fallback only if this is set to None explicitly.
|
||||
vhack_pairs_path: Path | None = Path("out/pairsets/prog_wide.json")
|
||||
# Directionality control: replace route2's pair-derived v_grad with a per-module
|
||||
# Haar-random unit vector. Tests whether route2's suppression NEEDS the direction
|
||||
# (H4: alignment) or is alignment-agnostic quarantine-absorption (H2). Seeded so
|
||||
|
||||
Reference in New Issue
Block a user