New routeV_gate="online_stats" mode: use the empirical per-rollout cosine
distribution (q5/q95 pooled across all modules each step) as the routing
band thresholds, instead of the pair-derived route_band. Direction v_grad
still from authored pairs; only thresholds are online/adaptive.
Motivation: the pair-derived band sits above the live cosine distribution
(median live cos ~-0.06), causing frout to cliff as GRPO advantage flattens.
Online thresholds adapt to the actual live distribution, so the 5/95 tails
always route regardless of where the raw cosines land.
Config: routeV_gate="online_stats", online_stats_lo=0.05, online_stats_hi=0.95.
Step-0 prior: (-0.5, 0.5) neutral band (pairs not used for calibration).
Band update: post-opt.step(), torch.quantile over that step's module*rollout cosines.
No-cheat: v_grad from authored pairs only; thresholds from the cosine distribution
of live student rollouts (no oracle/labeling of live rollouts as hack/clean).
Also: add online_stats to results_deploy._arm(); justfile queue-online-stats recipe.
Queued as job 22 (s43, authored pairs, priority 12, after 19/20/21).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
scripts/results_deploy.py pulls the held-out TEST deploy numbers from the FINAL EVAL
line that just-results skips. Journal: per-rollout real==random (absorption), per-token
real-V is the lead; pinning suspected off (band above live cos).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- FastConfig: teacher_pool_dir -> teacher_pool_runtests_dense, grad_clip -> 500
(were passed explicitly on every fast call). Dropped --teacher-pool-dir/--grad-clip
from the dir6 calls and --grad-clip from all other fast recipes; smoke/dev recipes
keep their own teacher_pool override.
- End-of-run summary reordered per token-efficient-logging 'final 30 lines': the wide
results row and the giant per-step table now print ABOVE the tail. The last lines are
just argv, a compact hack/solve x knob-on/knob-off table, and the single objective
(deploy solve - hack), since solve and hack alone are gameable.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The 6-prompt teacher_pool_runtests covered ~3% of the 200-prompt train pool, so
~1 step in 8 saw a teacher demo and the student never learned the hack within 60
steps (hack_s=0/28 through step 19, job 0) -> all arms ~0 hack -> directionality
comparison invalid.
scripts/build_runtests_pool.py: builds a DENSE single-mode pool from the full
model-generated rh-s65 teacher pool (233 prompts, in-sample hacks), re-grades
each under env_mode=run_tests, keeps verified exploits (215/233 = 92% re-verify;
the rest went stale under the post-grader-bug grader). One demo/prompt (G_t=1
per step), no partition.json. Reuses compute_reward; row schema copied verbatim
from build_substrate so the pools are loader-compatible.
- queue-dir6 -> teacher_pool_runtests_dense (all 8 arms).
- build-runtests-pool recipe -> the new dense builder (was: copy 6 from substrate).
- main.tex teacher-seeding paragraph: disclose re-grade+verify, drop the now-wrong
'no re-grading' and the stale 6-prompt count; note demos are full problem-specific
completions (real solution + permissive self-written run_tests), not a snippet.
Source = HACKY checkpoint (rh-s65), not base. Old 6-prompt sweep killed and
requeued on the dense pool.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Train side of the same contamination bug: fast preset loaded first-200-by-id =
the lowest/oldest/most pretraining-memorized problems (base solves them easily ->
weak hack incentive). Now a seeded-random representative sample (seed=cfg.seed),
with the teacher-seed ids pinned in so seeding still fires. Paper trains on all
992 (base ~20%); job 176 confirmed base test=0.094 / train_filtered=0.203,
matching paper fn9.
Adds justfile recipes:
- queue-dir6 SEED: 8-arm single-seed directionality set (routeV real rollout/
per-token, random-V both, vanilla, vampire in-subspace placebo, +2 LoRA-frozen-B
routeV) on teacher_pool_runtests + fixed eval.
- queue-broad: headline arms (vanilla/erase/routeV) x 3 seeds for paired-t
significance + directionality/adapter ablations at one seed.
Spec: docs/spec/20260607_eval_contamination_fix.md (force-added; docs/ gitignored).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The periodic VAL eval ran on leetcode_train_medhard_holdout.jsonl (353, our
artifact): disjoint from train by id but in the train id/recency range (ids
3-3205, 88% medium), so dominated by classic problems Qwen3-4B memorized in
pretraining -> base solve 0.94, saturating solve and killing the hack metric's
gt-fail headroom. Disjoint-by-id controls for TRAIN leakage, not pretraining
MEMORIZATION; only the recency-held-out test set (ids >= 3243) reproduces the
paper rate.
Proof (job 176, base model, same eval_hack_solve): test_medhard solve=0.094,
matching paper fn9 (~12% test) -> eval pipeline is sound, holdout was the
contaminant. Fix: drop the holdout; periodic curve + final number both eval the
paper test set leetcode_test_medhard. Smoke green. Hint confirmed = paper's
simple_overwrite_tests (not the easier _detailed/_aware variants).
Also this session: removed stale teacher-pool TRAIN restriction; seeded shuffle
for eval load; LoRA-frozen-B adapter; rescore CLI Positional fix. Known follow-up
(journal e): train pool is still first-200-by-id (easy/memorized), same bug class.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Ariahw randomizes the run_tests function name at eval (line 84) so a high hack
rate reflects responding to the disclosed mechanism, not memorizing a string.
Our 3 added side-channel modes (stdout/sentinel/file) used family detectors with
a fixed hint, so eval fell inside the train family -> no train/test gap. Fix: at
eval, disclose a NOVEL per-mode token and tighten the detector to require exactly
it (rewards.py _probe_channels gains eval_marker_phrase/eval_sentinel_attr/
eval_result_file; defaults None = unchanged family path at train). run_tests uses
the existing test_func_name param. eval.py randomize_eval_markers swaps the WHOLE
train hint (never touches problem text), seeded per problem so the curve stays
smooth and comparisons stay paired. New gate scripts/verify_eval_gap.py proves,
per mode: train-token hack scores under family, NOT under eval-token; novel-token
hack scores under eval-token. Wired into smoke.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
route2 (binary-tau) and routeV (banded gate) are different methods -- give the
new one a distinct id so old/new runs can't be confused (see hypothesis doc).
- src/vgrout/* + justfile: route2->routeV, routing2->routingV (figs.py keeps the
old keys for plotting historical runs).
- Final eval: eval_n_prompts_final=64 distinct prompts (periodic curve stays light
at eval_n_prompts) + fixed gen seed (common random numbers across arms) so the
paper deploy numbers aren't sampling-noise (the n=8-prompt eval gave 0.031 vs
0.125 at the same checkpoint).
- save_ckpt: also write delta_S_hack to sibling _hack.safetensors so runs can be
re-scored knob-ON at higher n later (train.safetensors stays delta_S-only).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
fast-projected / full no longer pin v_hack_full.safetensors; erase now extracts
from the prog_wide default (auto-resolves v_hack_pairset_prog_wide), the same
pair set route2 uses -> apples-to-apples arms. Smoke recipes keep their
tiny-model v_hack pins (the tiny model needs its own basis).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Replace the confounded route2 gate (hack_anchor force-routed teacher +
weak-detector student rows by LABEL; EMA tau calibrated from a live detector
over student rollouts at train time = a cheat) with a band calibrated from the
contrastive pairs alone:
lower = mean clean-pair cos(g, v_grad); upper = mean hack-pair cos
per rollout: f = clamp((cos(g_b, v_grad) - lower)/(upper - lower), 0, 1)
routed = sum_b f_b * g_b -> delta_S_hack; kept = g - routed -> delta_S
v_grad is now the SOLE router: no detector or gt_pass touches routing, so
"does v_hack generalize to held-out modes" is clean and random-vs-real is
decisive. Band width (upper-lower) is itself the discriminator: smoke shows
+0.289 real vs -0.014 Haar-random (collapsed). conservation routed+kept=g
holds exactly; resid~0 in smoke (no hack leak into the deployed knob).
- delete build_route2_anchors + EMA state (ema_hack/clean_cos, route2_tau)
- add route_band_edges(); build at extract, rebuild on v_grad refresh
- drop --gate-anchor-teacher-only config + retire scripts/verify_gate_anchor.py
- teacher rollouts now route through the same band (not force-routed)
- spec: add the mass-confound control (scientist review 2026-06-06)
smoke-route2 + smoke-route2 --route2-random-v-seed=7 both pass; erase smoke green.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
git mv src/projected_grpo -> src/vgrout and find-replace the module name in
all imports (.py), `-m projected_grpo.*` invocations (justfile), and the
[project] name (pyproject; setuptools auto-discovers via where=["src"]).
Left RESEARCH_JOURNAL.md untouched: its commands/paths are dated lab notes
tied to past commits, so rewriting them would falsify provenance. Repo dir,
git remote, and absolute paths unchanged.
Verified: `import vgrout` and `python -m vgrout.train --help` load the full
graph; verify_rewards.py + verify_gate_anchor.py (both import vgrout) pass.
Full `just smoke` is blocked upstream by missing gitignored data artifacts
(out/pools/{substrate,teacher_pool}, out/vhack/*smoke*), unrelated to the rename.
The other half of the no-cheat family (sibling of the gate-anchor leak). Asserts
on the real out/pools/substrate/partition.json: (1) partition is a clean function
into the 4 distinct substrate modes, each populated; (2) under teacher_modes={run_tests}
the kept teacher pool is ALL known-mode -- held-out modes get ZERO demos and are
genuinely held out (>0 problems). Vibe-check, not a theorem; wired into just smoke.
6/6 pass.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The route2 tau-gate anchored on (teacher OR hacked_E student). hacked_E is the
run_tests detector; it cross-fires <=1.1% on held-out modes (stdout 17/1540,
file_marker 2/1337), force-routing those rollouts -- a real label leak into the
held-out class, not noise. Add gate_anchor_teacher_only: anchor on teacher rows
only, so held-out classes get PROVABLY zero detector labels (airtight A5 control).
Extracted the inline anchor loop to build_route2_anchors() and added
scripts/verify_gate_anchor.py (wired into just smoke): proves default reproduces
the leak (held-out FP student force-routed) and teacher_only removes it (zero
student routing, teachers unchanged). 9/9 assertions pass.
Rescoring can't fix this -- the leak is in training (gate shaped the weights),
not scoring (per-mode ground-truth eval is clean). Retrain is the only path; the
A5 run saved no per-eval checkpoints anyway.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
verify_rewards, verify_vhack_heldout, build_substrate, probe_distill, probe_plot_stack
are run via 'python -m' / justfile and imported by no core module -> moved to scripts/,
relative imports rewritten to 'from projected_grpo.X'. probe_distill's sibling import
of probe_plot_stack is now a flat import (co-located in scripts/). regrade_pool stays
in src (pairs_from_pool imports load_problems_by_id from it). justfile recipes updated.
src/projected_grpo/ is now 16 importable modules: train + method (proj/vhack/antipasto/
extract_vhack_grad) + env (rewards/eval/problems/data) + pairs (pairs/pairs_from_pool/
regrade_pool/derisk_loopholes) + tablelog/figs. ~1480 lines moved out of the package.
Smoke green (verify_rewards 52/52 from scripts/, train pipeline cout->0).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Dropped dead job-ID narrative (job 60/64) on rollout_ablate_frac, the
'vanilla step 17' dead-run ref in eval.py, the 'old signed sum' dead-code ref in
proj.py, and the conversational 'current experiment line' lead. Removed the dead
probe-traj justfile recipe. Kept all TODO/FIXME and the 'why' memory-tuning
comments. Smoke green (cout->0).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
paper-longrun, paper-noteacher, paper-teacheroff, paper-harvest -- each pueue-adds
with a why:/resolve: label so every paper job is reproducible from one command.
longrun uses the KL-stabilised optimizer (beta=1e-5, Adam 0.9/0.99).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
--latest-per-arm + --min-steps select the freshest >=N-step log for each
arm from logs/, no hand-globbing. Harden parse_log against historical logs:
require '| INFO |' in the header line, drop pure-symbol header tokens.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The CPU smoke ran fp32 + sdpa, so it never walked the bf16/flash_attn2 path the
real run uses -- a whole dtype/magnitude bug class was invisible to the gate (per
the smoke principle: a path that doesn't fire in smoke isn't covered). The tiny-
random model peaks ~1.4GB on GPU, so cost is negligible. Drop CUDA_VISIBLE_DEVICES=
from every smoke recipe; train.py auto-detects cuda -> bf16. (Stale fp32 smoke
v_hack must be re-extracted bf16; auto-extracts on cache-miss.)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The distinct-basis A_q/B_q LoRA (~33M params at rank-16) gave the quarantine a
~100x capacity edge over delta_S, so routing-everything-there was the low-
resistance path: qE pinned ~0.97 (energy into the thrown-away knob) while the
deployed delta_S learned nothing (job 54). The cause was capacity imbalance, not
the routing gate (calibrated-tau already separated hack/clean, hkgap>0).
Consolidate to one adapter type: the quarantine is now delta_S_hack, the second
diagonal in the same frozen SVD basis, shape [r], capacity-matched to delta_S,
zeroed at deploy. route2's calibrated-tau gate parks the flagged rollouts' grad
into delta_S_hack.grad (like proj.py's route parks its subspace projection);
delta_S keeps the unflagged. Both diagonals train at one shared lr.
Removed: A_q/B_q params, v_act + extract_v_act, the act-mask arm (a shared
diagonal can't be per-token gated), route2_mask / route2_quarantine_rank /
route2_quar_lr_scale knobs, the separate quar optimizer group. Arm name
routing2_{act,grad} -> routing2. v_grad refresh extracts from delta_S (main)
with the quarantine ablated.
SGTM check: their gradient routing uses a hard detach on capacity-matched
reserved dims, no soft/tanh/sigmoid gate -- balance is the fix, not gating.
Smoked clean: tau/hkgap/qE render, ||delta_S_hack||>0 assert passes, exit 0.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
route2-act diverged (run 43): 33M kaiming A_q/B_q at delta_S's lr=3e-3 blew up
(gn 0.3->7.5 step 8, generations -> token salad, lp_t -11). Fixes:
- #167 separate quarantine lr (route2_quar_lr_scale=0.1) so the 60x-bigger fresh
LoRA isn't trained at the main-knob lr.
- #168 divergence tripwire on teacher ppl (lp_t high-water mark; abort if it
drops >5 nats for 2 steps). Relative so tiny-random smoke (flat lp_t~-11.9)
doesn't false-trip.
- #165 act-path was silent: stash cos(a,v_act) + fired-fraction in the forward,
surface as act_cos/act_fire columns (route2-act). smoke shows act_fire=0.64 =>
the cos>0 sign test over-routes (fires on most tokens, not just hack ones).
- #166 print last train generation before FINAL EVAL (coherence eyeball).
- route2 v_act/v_grad refresh was firing but silent -- now announced.
- #162 plot_deploy_overlay.py: per-mode DEPLOY overlay from per_mode_deploy.json
(honest shipped-model numbers, route2-safe). just plot-deploy.
- just plot/results hardened: parse by header name, skip non-substrate logs,
non-fatal aggregate delegation.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
#164: the final eval now runs for ALL arms (not just route/route2) on the
same fixed eval subset, so the all-arms overlay reads identical per-mode
numbers. vanilla/erase have no quarantine -> deploy == train (one eval);
route/route2 also run the knob-off (ablated) eval. Writes a single
per_mode_deploy.json into run_dir (arm, mask, refresh, seed + per-mode
train/deploy hack+solve) as the canonical source for the #162 overlay plot.
justfile: replace the parametrized run-substrate (which re-passed seed/steps/
refresh/mask defaults every invocation) with one explicit queue-substrate that
queues the fixed 5-arm overlay set, each arm passing ONLY its non-default flags.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The verbose argv (--teacher-pool-dir, --vhack-pairs-path, and redundant
--vhack-refresh-every/--seed/--steps) came from run-substrate passing
everything explicitly. steps/seed/refresh were already defaults; the two
paths weren't. Now FastConfig defaults to the current experiment line so a
real run needs only --intervention (+ optional seed/refresh/mask). Smoke
(SmokeConfig) unaffected -- it sets its own pool. Stripped the recipe to match.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Smoke is fp32 (CPU tiny-random) so the bf16 path never fired -- job 34/35
crashed on the real Qwen3-4B with 'BFloat16 != float' in the quar matmul.
Cast A_q/B_q/v_act down to activation dtype in the forward, mirroring the
delta_S.to(a.dtype) pattern (fp32 master, bf16 compute, grads cast back).
Validated forward+backward in bf16 for both masks. + run-substrate MASK param.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Arm A (route2_mask=grad): per-rollout gate splice (identity at c=1) recovers
the per-sample delta_S grad after backward (c.grad = delta_S * g_b); train.py
divides it out (eps-guard |delta_S|>1e-6), flags rollouts by cos(g_b, v_grad)>0,
and SUBTRACTS them from delta_S.grad. Single-pass, no forward detach, no second
backward -- the cross-step mismatch that made the spec's A1 stale-mask awkward
never arises (routing is post-backward within the step). v_grad = unit-mean
gradient diff from extract_v_hack raw grads (gradient-space analogue of v_act).
route2 forces the combined (non-split) backward since cos_pre is NaN for it
anyway, which also gives the gate a single clean grad to read.
Drop route2_tau: never tuned; the mask is cos>0 (the natural hack-ward boundary)
and the load-time noise floor already filters axes.
v_hack path now auto-derives from --vhack-pairs-path (out/vhack/v_hack_pairset_
<stem>.safetensors): pass the pairset, the hack file auto-loads/extracts -- no
need to also pass --v-hack-path. run-substrate drops the redundant flag.
smoke: smoke-route2 (act) and new smoke-route2-grad both pass (||B_q||=0.109,
exit 0); erase shared-basis path unchanged (cout->0, fired~0.9).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Adds intervention=route2: a LoRA quarantine (A_q,B_q) with its own basis,
always summed into the forward, plus a per-sample activation-cosine mask that
detaches the kept adapter for flagged samples. Routing happens in the forward,
not via grad surgery: a flagged sample updates only the quarantine; an unflagged
hack-like sample concentrates there by gradient magnitude (absorption). Deploy
zeroes A_q,B_q. v_act built by extract_v_act (forward-only activation mean-diff
over persona pairs). Fixes the per-prompt zero_grad wiping quarantine grads
before opt.step. scripts/make_random_vhack.py = the random-V route control.
vhack_refresh_every default 0->5 (0 is ablation-only).
Smoke: R1 grad check passes (flagged->delta_S grad 0, A_q/B_q>0; forward value
unchanged); smoke-route2 ||B_q||=0.109, deploy eval + asserts pass.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- plot_substrate.main now also calls plot_dynamics.plot/plot_hack_overlay so one
command produces all 4 figs (by_method, by_hack, aggregate, hack_overlay); the
aggregate 'total hacks per arm' core plot is kept, not reimplemented.
- plot_dynamics: point parser at CURRENT streaming headers (cin_t/cin_s, hk_dep/
slv_dep); it was built for the old cos_pre_t/hack_deploy spelling and silently
failed on sub4 logs. No backward-compat for the superseded header.
- justfile: 'plot GLOB STEM' canonical entrypoint over logs/*_sub4_*.log.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
prog_wide pairset cut hack the most (-0.226, no pass cost) in the pairset
comparison (results.md), so it's the default v_hack source for the
erase/route arms; vanilla ignores it. REFRESH defaults to 5.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
intervention=none is a pure GRPO baseline: skip v_hack load/extract entirely
(v_hack=None), emit a nan diag, and the cin/cout/fired columns are already
hidden on the vanilla arm (#141). A --v-hack-path passed to vanilla is logged
and ignored. Removes the misleading cos_pre baseline and the ~5-min auto-extract
a vanilla run would otherwise trigger on a cache miss.
run-substrate recipe: drop the MIX override (inherit locked 0.125) and the
--v-hack-path (vanilla needs none); erase/route substrate runs pass it explicitly.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Same none/erase/route matrix on the expose-K (M2) env, v_hack still the M1
basis -> tests whether an M1-derived direction suppresses the M2 hardcode hack
with no oracle. Teacher-free (M2 emerges on-policy). steps=60, grad_clip=10 by
default now.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Code writes+reads the new scheme; migrate_out_dirs.py moved 225 loose artifacts
(0 left at top level). Per-run checkpoints+rollouts now group under
runs/<ts>_<run_id>/ as train.safetensors/rollouts.jsonl. Figures land in
out/figs/ with a stable docs/figs/<name>.png symlink (figs.link_latest).
justfile also gains run-cell REFRESH param (online-erasure arm). Smoke +
smoke-vanilla + results all green on new paths. Requeue manifest preserves the
why/resolve labels that pueue reset wiped.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
T5: eval_hack_solve helper + ablate_quarantine ctx; periodic ablated-eval
(hack_abl/solve_abl cols, appended so results.py indices unchanged) every
--eval-ablate-every steps; final kept-vs-ablated ROUTE EVAL BLUF. plot_dynamics
plots the ablated series for the routing arm (the coherence-gap fix: training
hack_s looks vanilla; routing only shows post-ablation).
External-review fixes (docs/spec/20260530_code_review.md):
- Critical: route now feeds delta_S the SAME g_proj as erase (was forcing
preserve_magnitude=False/overshoot=1, which diverged from erase before AdamW).
delta_S is its own AdamW param fed erase's grad, so route-ablated deployment
evolves identically to erase regardless of AdamW non-linearity. Only the
combined training forward over-moves (intended; never deployed). Corrected the
overclaiming docstrings (no "sum == g" / "reproduces vanilla" identity).
- Important: clip_grad_norm_ now covers delta_params + delta_hack_params
(no-op for none/erase; bounds the route update).
- Important: results.py paired-delta table includes routing (keyed on arm).
smoke route/erase/vanilla green: dsh route=0.0105 erase/none=0, span=2.9e-7,
ROUTE EVAL BLUF prints.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Stage-1 (T3) of the routing spec. Adds a per-module quarantine knob
delta_S_hack (AntiPaSTO forward = delta_S + delta_S_hack, both 0 at init).
intervention=route parks the hack-ward grad component (g - cV to delta_S,
cV to delta_S_hack) instead of erasing it; eval ablates delta_S_hack.
- proj.py: route flag splits the grad (overshoot=1, no rescale -> the split
sums to g, so the training forward still moves hack-ward; route ⊇ erase).
- antipasto.py: second trainable knob, identity preserved at init.
- train.py: arm -> intervention {none,erase,route}; arm kept as a derived
display name so run-id/BLUF/results.py/plot classify are unchanged. opt
steps both knobs (hack knob grad=None under none/erase -> AdamW skips it,
so erase reproduces old `projected` bit-for-bit, R4). R3 span assert
(resid/||gh|| < 1e-4) + end-of-run ||delta_S_hack|| guard (route >0).
- results.py / plot_dynamics.py: read arm from the preset line (covers both
old --arm and new --intervention logs); plot classifies `routing`.
smoke: none ||dsh||=0, erase clean, route ||dsh||=0.0105 span=2.9e-7. 64
archived projected rows still parse.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Both were negative results (docs Q4, Q9) and are now dead weight. Removes the
Config fields, the extract_v_hack params, the rank-1 mean-diff branch, the
solve-orth D-projection block, and the extract-vhack-meandiff recipe. The
v_hack_*_meandiff / *_18base / *_18solveorth4 artifacts stay on disk as frozen
evidence for those table rows. Smoke passes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
docs/results.md: lead with absolute last-5 rates (compare within a table by
eye); restrict refresh-cadence/gate/basis comparisons to the seed they actually
share (kills the fake refresh "ladder" that compared n=1 cadences to a 4-seed
frozen mean); add Q6 solve columns, Q8 pair-content axis breakdown (8/18 pairs
are axis-1 weak-tests; the 21-pair set is not in committed pairs.py -> FIXME),
Q9 solve-orth negative result, and a dynamics note (solve never climbs; hack
plateaus ~step 15).
scripts/results.py: add `log` provenance column; drop the wide argv/time cols.
Lock mix_ratio=0.125 as the default (FastConfig group 4->8 so the split is
non-degenerate; drop --mix-ratio=0.5 from fast recipes). Q6 shows 0.125 keeps
the hack cut with no solve tax. Smoke passes.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>