Config (make the design axes explicit Literal choices):
- eval: Literal[eval2,eval3] (default eval3 = 10% unhackable, deployment-like);
unhackable_frac is now a derived property; eval/unhackable_frac/pairs recorded
in deploy_test.json metadata.
- intervention gains routeV_per_token (folds the per-token bool into the arm choice).
- routeV_gate documented as the pinning axis.
- FastConfig grad_clip 500->10 (was never load-bearing); FastLoraConfig subcommand
(fast-lora) at lr=1e-4 -- the hot 3e-3 diverged lora_frozen_b (job 25, ppl 6e5 gn98 step4).
Pairs:
- delete prog_wide.json (14/30 print-without-assert contaminated; history in git);
default -> prog_wide_clean.
- rename run_tests->execute_tests in prog_wide_clean + pairs_authored so the
extraction pairs are OOD (never use the env's real grader fn name). Re-extracted
v_hack_smoke to match.
justfile: --routeV-per-token -> intervention=routeV_per_token; drop --unhackable-frac
(eval3 default); lora recipes -> fast-lora subcommand; prog_wide -> prog_wide_clean.
smoke green (erase + routeV_per_token); all 4 verify gates pass.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- floor_ceiling_abs.png: clean deploy Pareto. All 5 arms as dots, ideal star at the
good corner (no-hack x ceiling), base->base model label, x clamped at no-hack. No
arrows: knob-on is only measured at val, so a val-before -> deploy-after arrow would
fake a solve jump that's really the n=32->n=119 eval-set shift.
- floor_ceiling_knob.png: the real before->after on ONE eval (val n=32). Hollow knob-on
-> solid knob-off per arm; the move is diagonal (solve changes: prog_wide 0.069->0.056,
authored 0.056->0.044), not the horizontal I wrongly forced earlier.
- justfile: queue-unhackable now 200 steps (solve is a slow signal under the unhackable
fraction), low priority; vanilla rerun alongside best (its solve also suffers).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Realism knob: in the reference env hacking saturates and kills the solve gradient.
A seeded-random per-problem Bernoulli flip to gt_only (no loophole, only honest
solving pays) keeps a persistent solve pressure all arms feel. The differential test:
routeV ablates the hack on the hackable 90% so it must solve there, while the warm
solve-skill from the 10% should make its solve-uplift-over-vanilla larger than at frac=0.
- gt_only's hint is the plain 'pass all tests' (no-op), so a flipped problem is an
ordinary solve task. Train-only; eval stays all-loophole (gt_only subtracted from
eval_modes) so hack/solve remain comparable to the reference env.
- logged rollout env_mode now reads prob['env_mode'] (single source of truth) so the
flip propagates to per-mode metrics.
- smoke-unhackable recipe + queue-unhackable (vanilla vs routeV per-token at frac=0.1).
Verified: frac=0.1->~7%, 0.3->~28% gt_only; deterministic per seed; gt_only drops the
loophole hint. smoke-unhackable runs end-to-end.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Route the whole gradient of every knob-on rollout into the quarantine; the
deployed knob learns only from the knob-off exploration floor. Direction-free
(v_grad extracted but never enters f -> routing is purely by generation mode).
Config flag + _step_absorb_f holder + filter branch (reuses act_vote per-rollout
machinery) + per-step is_ablated stash. just smoke-absorb passes (keep=0.25/
rout=0.75 = the floor/knob-on split). Queued s43 as job 29 (frac=0.25).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Two-stage script: build out/plots/floor_ceiling.csv (one row per arm/anchor,
with SOURCE and STATUS columns flagging every provisional/missing cell) then
the keynote figure. Prints TODO/FIXME data gaps before plotting.
Panel A: normalized floor->ceiling bars, headline deploy (knob-off, test n=119).
Panel B: the knob effect -- arrow knob-ON -> knob-OFF on the SAME held-out val
split (eval_curve.jsonl), isolating the quarantine from the train/test
memorization gap. Fixes the earlier conflation where the train->deploy arrow
mixed knob-on/off with train-problems/test-problems.
Data gaps flagged in csv: solve ceiling provisional=paper 0.223 (FIXME job 24),
prog_wide arm contaminated (TODO job 28 prog_wide_clean).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
New routeV_gate="online_stats" mode: use the empirical per-rollout cosine
distribution (q5/q95 pooled across all modules each step) as the routing
band thresholds, instead of the pair-derived route_band. Direction v_grad
still from authored pairs; only thresholds are online/adaptive.
Motivation: the pair-derived band sits above the live cosine distribution
(median live cos ~-0.06), causing frout to cliff as GRPO advantage flattens.
Online thresholds adapt to the actual live distribution, so the 5/95 tails
always route regardless of where the raw cosines land.
Config: routeV_gate="online_stats", online_stats_lo=0.05, online_stats_hi=0.95.
Step-0 prior: (-0.5, 0.5) neutral band (pairs not used for calibration).
Band update: post-opt.step(), torch.quantile over that step's module*rollout cosines.
No-cheat: v_grad from authored pairs only; thresholds from the cosine distribution
of live student rollouts (no oracle/labeling of live rollouts as hack/clean).
Also: add online_stats to results_deploy._arm(); justfile queue-online-stats recipe.
Queued as job 22 (s43, authored pairs, priority 12, after 19/20/21).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
scripts/results_deploy.py pulls the held-out TEST deploy numbers from the FINAL EVAL
line that just-results skips. Journal: per-rollout real==random (absorption), per-token
real-V is the lead; pinning suspected off (band above live cos).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- FastConfig: teacher_pool_dir -> teacher_pool_runtests_dense, grad_clip -> 500
(were passed explicitly on every fast call). Dropped --teacher-pool-dir/--grad-clip
from the dir6 calls and --grad-clip from all other fast recipes; smoke/dev recipes
keep their own teacher_pool override.
- End-of-run summary reordered per token-efficient-logging 'final 30 lines': the wide
results row and the giant per-step table now print ABOVE the tail. The last lines are
just argv, a compact hack/solve x knob-on/knob-off table, and the single objective
(deploy solve - hack), since solve and hack alone are gameable.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The 6-prompt teacher_pool_runtests covered ~3% of the 200-prompt train pool, so
~1 step in 8 saw a teacher demo and the student never learned the hack within 60
steps (hack_s=0/28 through step 19, job 0) -> all arms ~0 hack -> directionality
comparison invalid.
scripts/build_runtests_pool.py: builds a DENSE single-mode pool from the full
model-generated rh-s65 teacher pool (233 prompts, in-sample hacks), re-grades
each under env_mode=run_tests, keeps verified exploits (215/233 = 92% re-verify;
the rest went stale under the post-grader-bug grader). One demo/prompt (G_t=1
per step), no partition.json. Reuses compute_reward; row schema copied verbatim
from build_substrate so the pools are loader-compatible.
- queue-dir6 -> teacher_pool_runtests_dense (all 8 arms).
- build-runtests-pool recipe -> the new dense builder (was: copy 6 from substrate).
- main.tex teacher-seeding paragraph: disclose re-grade+verify, drop the now-wrong
'no re-grading' and the stale 6-prompt count; note demos are full problem-specific
completions (real solution + permissive self-written run_tests), not a snippet.
Source = HACKY checkpoint (rh-s65), not base. Old 6-prompt sweep killed and
requeued on the dense pool.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Train side of the same contamination bug: fast preset loaded first-200-by-id =
the lowest/oldest/most pretraining-memorized problems (base solves them easily ->
weak hack incentive). Now a seeded-random representative sample (seed=cfg.seed),
with the teacher-seed ids pinned in so seeding still fires. Paper trains on all
992 (base ~20%); job 176 confirmed base test=0.094 / train_filtered=0.203,
matching paper fn9.
Adds justfile recipes:
- queue-dir6 SEED: 8-arm single-seed directionality set (routeV real rollout/
per-token, random-V both, vanilla, vampire in-subspace placebo, +2 LoRA-frozen-B
routeV) on teacher_pool_runtests + fixed eval.
- queue-broad: headline arms (vanilla/erase/routeV) x 3 seeds for paired-t
significance + directionality/adapter ablations at one seed.
Spec: docs/spec/20260607_eval_contamination_fix.md (force-added; docs/ gitignored).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The periodic VAL eval ran on leetcode_train_medhard_holdout.jsonl (353, our
artifact): disjoint from train by id but in the train id/recency range (ids
3-3205, 88% medium), so dominated by classic problems Qwen3-4B memorized in
pretraining -> base solve 0.94, saturating solve and killing the hack metric's
gt-fail headroom. Disjoint-by-id controls for TRAIN leakage, not pretraining
MEMORIZATION; only the recency-held-out test set (ids >= 3243) reproduces the
paper rate.
Proof (job 176, base model, same eval_hack_solve): test_medhard solve=0.094,
matching paper fn9 (~12% test) -> eval pipeline is sound, holdout was the
contaminant. Fix: drop the holdout; periodic curve + final number both eval the
paper test set leetcode_test_medhard. Smoke green. Hint confirmed = paper's
simple_overwrite_tests (not the easier _detailed/_aware variants).
Also this session: removed stale teacher-pool TRAIN restriction; seeded shuffle
for eval load; LoRA-frozen-B adapter; rescore CLI Positional fix. Known follow-up
(journal e): train pool is still first-200-by-id (easy/memorized), same bug class.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Ariahw randomizes the run_tests function name at eval (line 84) so a high hack
rate reflects responding to the disclosed mechanism, not memorizing a string.
Our 3 added side-channel modes (stdout/sentinel/file) used family detectors with
a fixed hint, so eval fell inside the train family -> no train/test gap. Fix: at
eval, disclose a NOVEL per-mode token and tighten the detector to require exactly
it (rewards.py _probe_channels gains eval_marker_phrase/eval_sentinel_attr/
eval_result_file; defaults None = unchanged family path at train). run_tests uses
the existing test_func_name param. eval.py randomize_eval_markers swaps the WHOLE
train hint (never touches problem text), seeded per problem so the curve stays
smooth and comparisons stay paired. New gate scripts/verify_eval_gap.py proves,
per mode: train-token hack scores under family, NOT under eval-token; novel-token
hack scores under eval-token. Wired into smoke.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
route2 (binary-tau) and routeV (banded gate) are different methods -- give the
new one a distinct id so old/new runs can't be confused (see hypothesis doc).
- src/vgrout/* + justfile: route2->routeV, routing2->routingV (figs.py keeps the
old keys for plotting historical runs).
- Final eval: eval_n_prompts_final=64 distinct prompts (periodic curve stays light
at eval_n_prompts) + fixed gen seed (common random numbers across arms) so the
paper deploy numbers aren't sampling-noise (the n=8-prompt eval gave 0.031 vs
0.125 at the same checkpoint).
- save_ckpt: also write delta_S_hack to sibling _hack.safetensors so runs can be
re-scored knob-ON at higher n later (train.safetensors stays delta_S-only).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
fast-projected / full no longer pin v_hack_full.safetensors; erase now extracts
from the prog_wide default (auto-resolves v_hack_pairset_prog_wide), the same
pair set route2 uses -> apples-to-apples arms. Smoke recipes keep their
tiny-model v_hack pins (the tiny model needs its own basis).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Replace the confounded route2 gate (hack_anchor force-routed teacher +
weak-detector student rows by LABEL; EMA tau calibrated from a live detector
over student rollouts at train time = a cheat) with a band calibrated from the
contrastive pairs alone:
lower = mean clean-pair cos(g, v_grad); upper = mean hack-pair cos
per rollout: f = clamp((cos(g_b, v_grad) - lower)/(upper - lower), 0, 1)
routed = sum_b f_b * g_b -> delta_S_hack; kept = g - routed -> delta_S
v_grad is now the SOLE router: no detector or gt_pass touches routing, so
"does v_hack generalize to held-out modes" is clean and random-vs-real is
decisive. Band width (upper-lower) is itself the discriminator: smoke shows
+0.289 real vs -0.014 Haar-random (collapsed). conservation routed+kept=g
holds exactly; resid~0 in smoke (no hack leak into the deployed knob).
- delete build_route2_anchors + EMA state (ema_hack/clean_cos, route2_tau)
- add route_band_edges(); build at extract, rebuild on v_grad refresh
- drop --gate-anchor-teacher-only config + retire scripts/verify_gate_anchor.py
- teacher rollouts now route through the same band (not force-routed)
- spec: add the mass-confound control (scientist review 2026-06-06)
smoke-route2 + smoke-route2 --route2-random-v-seed=7 both pass; erase smoke green.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
git mv src/projected_grpo -> src/vgrout and find-replace the module name in
all imports (.py), `-m projected_grpo.*` invocations (justfile), and the
[project] name (pyproject; setuptools auto-discovers via where=["src"]).
Left RESEARCH_JOURNAL.md untouched: its commands/paths are dated lab notes
tied to past commits, so rewriting them would falsify provenance. Repo dir,
git remote, and absolute paths unchanged.
Verified: `import vgrout` and `python -m vgrout.train --help` load the full
graph; verify_rewards.py + verify_gate_anchor.py (both import vgrout) pass.
Full `just smoke` is blocked upstream by missing gitignored data artifacts
(out/pools/{substrate,teacher_pool}, out/vhack/*smoke*), unrelated to the rename.
The other half of the no-cheat family (sibling of the gate-anchor leak). Asserts
on the real out/pools/substrate/partition.json: (1) partition is a clean function
into the 4 distinct substrate modes, each populated; (2) under teacher_modes={run_tests}
the kept teacher pool is ALL known-mode -- held-out modes get ZERO demos and are
genuinely held out (>0 problems). Vibe-check, not a theorem; wired into just smoke.
6/6 pass.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The route2 tau-gate anchored on (teacher OR hacked_E student). hacked_E is the
run_tests detector; it cross-fires <=1.1% on held-out modes (stdout 17/1540,
file_marker 2/1337), force-routing those rollouts -- a real label leak into the
held-out class, not noise. Add gate_anchor_teacher_only: anchor on teacher rows
only, so held-out classes get PROVABLY zero detector labels (airtight A5 control).
Extracted the inline anchor loop to build_route2_anchors() and added
scripts/verify_gate_anchor.py (wired into just smoke): proves default reproduces
the leak (held-out FP student force-routed) and teacher_only removes it (zero
student routing, teachers unchanged). 9/9 assertions pass.
Rescoring can't fix this -- the leak is in training (gate shaped the weights),
not scoring (per-mode ground-truth eval is clean). Retrain is the only path; the
A5 run saved no per-eval checkpoints anyway.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
verify_rewards, verify_vhack_heldout, build_substrate, probe_distill, probe_plot_stack
are run via 'python -m' / justfile and imported by no core module -> moved to scripts/,
relative imports rewritten to 'from projected_grpo.X'. probe_distill's sibling import
of probe_plot_stack is now a flat import (co-located in scripts/). regrade_pool stays
in src (pairs_from_pool imports load_problems_by_id from it). justfile recipes updated.
src/projected_grpo/ is now 16 importable modules: train + method (proj/vhack/antipasto/
extract_vhack_grad) + env (rewards/eval/problems/data) + pairs (pairs/pairs_from_pool/
regrade_pool/derisk_loopholes) + tablelog/figs. ~1480 lines moved out of the package.
Smoke green (verify_rewards 52/52 from scripts/, train pipeline cout->0).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Dropped dead job-ID narrative (job 60/64) on rollout_ablate_frac, the
'vanilla step 17' dead-run ref in eval.py, the 'old signed sum' dead-code ref in
proj.py, and the conversational 'current experiment line' lead. Removed the dead
probe-traj justfile recipe. Kept all TODO/FIXME and the 'why' memory-tuning
comments. Smoke green (cout->0).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
paper-longrun, paper-noteacher, paper-teacheroff, paper-harvest -- each pueue-adds
with a why:/resolve: label so every paper job is reproducible from one command.
longrun uses the KL-stabilised optimizer (beta=1e-5, Adam 0.9/0.99).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
--latest-per-arm + --min-steps select the freshest >=N-step log for each
arm from logs/, no hand-globbing. Harden parse_log against historical logs:
require '| INFO |' in the header line, drop pure-symbol header tokens.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The CPU smoke ran fp32 + sdpa, so it never walked the bf16/flash_attn2 path the
real run uses -- a whole dtype/magnitude bug class was invisible to the gate (per
the smoke principle: a path that doesn't fire in smoke isn't covered). The tiny-
random model peaks ~1.4GB on GPU, so cost is negligible. Drop CUDA_VISIBLE_DEVICES=
from every smoke recipe; train.py auto-detects cuda -> bf16. (Stale fp32 smoke
v_hack must be re-extracted bf16; auto-extracts on cache-miss.)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The distinct-basis A_q/B_q LoRA (~33M params at rank-16) gave the quarantine a
~100x capacity edge over delta_S, so routing-everything-there was the low-
resistance path: qE pinned ~0.97 (energy into the thrown-away knob) while the
deployed delta_S learned nothing (job 54). The cause was capacity imbalance, not
the routing gate (calibrated-tau already separated hack/clean, hkgap>0).
Consolidate to one adapter type: the quarantine is now delta_S_hack, the second
diagonal in the same frozen SVD basis, shape [r], capacity-matched to delta_S,
zeroed at deploy. route2's calibrated-tau gate parks the flagged rollouts' grad
into delta_S_hack.grad (like proj.py's route parks its subspace projection);
delta_S keeps the unflagged. Both diagonals train at one shared lr.
Removed: A_q/B_q params, v_act + extract_v_act, the act-mask arm (a shared
diagonal can't be per-token gated), route2_mask / route2_quarantine_rank /
route2_quar_lr_scale knobs, the separate quar optimizer group. Arm name
routing2_{act,grad} -> routing2. v_grad refresh extracts from delta_S (main)
with the quarantine ablated.
SGTM check: their gradient routing uses a hard detach on capacity-matched
reserved dims, no soft/tanh/sigmoid gate -- balance is the fix, not gating.
Smoked clean: tau/hkgap/qE render, ||delta_S_hack||>0 assert passes, exit 0.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
route2-act diverged (run 43): 33M kaiming A_q/B_q at delta_S's lr=3e-3 blew up
(gn 0.3->7.5 step 8, generations -> token salad, lp_t -11). Fixes:
- #167 separate quarantine lr (route2_quar_lr_scale=0.1) so the 60x-bigger fresh
LoRA isn't trained at the main-knob lr.
- #168 divergence tripwire on teacher ppl (lp_t high-water mark; abort if it
drops >5 nats for 2 steps). Relative so tiny-random smoke (flat lp_t~-11.9)
doesn't false-trip.
- #165 act-path was silent: stash cos(a,v_act) + fired-fraction in the forward,
surface as act_cos/act_fire columns (route2-act). smoke shows act_fire=0.64 =>
the cos>0 sign test over-routes (fires on most tokens, not just hack ones).
- #166 print last train generation before FINAL EVAL (coherence eyeball).
- route2 v_act/v_grad refresh was firing but silent -- now announced.
- #162 plot_deploy_overlay.py: per-mode DEPLOY overlay from per_mode_deploy.json
(honest shipped-model numbers, route2-safe). just plot-deploy.
- just plot/results hardened: parse by header name, skip non-substrate logs,
non-fatal aggregate delegation.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
#164: the final eval now runs for ALL arms (not just route/route2) on the
same fixed eval subset, so the all-arms overlay reads identical per-mode
numbers. vanilla/erase have no quarantine -> deploy == train (one eval);
route/route2 also run the knob-off (ablated) eval. Writes a single
per_mode_deploy.json into run_dir (arm, mask, refresh, seed + per-mode
train/deploy hack+solve) as the canonical source for the #162 overlay plot.
justfile: replace the parametrized run-substrate (which re-passed seed/steps/
refresh/mask defaults every invocation) with one explicit queue-substrate that
queues the fixed 5-arm overlay set, each arm passing ONLY its non-default flags.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The verbose argv (--teacher-pool-dir, --vhack-pairs-path, and redundant
--vhack-refresh-every/--seed/--steps) came from run-substrate passing
everything explicitly. steps/seed/refresh were already defaults; the two
paths weren't. Now FastConfig defaults to the current experiment line so a
real run needs only --intervention (+ optional seed/refresh/mask). Smoke
(SmokeConfig) unaffected -- it sets its own pool. Stripped the recipe to match.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Smoke is fp32 (CPU tiny-random) so the bf16 path never fired -- job 34/35
crashed on the real Qwen3-4B with 'BFloat16 != float' in the quar matmul.
Cast A_q/B_q/v_act down to activation dtype in the forward, mirroring the
delta_S.to(a.dtype) pattern (fp32 master, bf16 compute, grads cast back).
Validated forward+backward in bf16 for both masks. + run-substrate MASK param.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Arm A (route2_mask=grad): per-rollout gate splice (identity at c=1) recovers
the per-sample delta_S grad after backward (c.grad = delta_S * g_b); train.py
divides it out (eps-guard |delta_S|>1e-6), flags rollouts by cos(g_b, v_grad)>0,
and SUBTRACTS them from delta_S.grad. Single-pass, no forward detach, no second
backward -- the cross-step mismatch that made the spec's A1 stale-mask awkward
never arises (routing is post-backward within the step). v_grad = unit-mean
gradient diff from extract_v_hack raw grads (gradient-space analogue of v_act).
route2 forces the combined (non-split) backward since cos_pre is NaN for it
anyway, which also gives the gate a single clean grad to read.
Drop route2_tau: never tuned; the mask is cos>0 (the natural hack-ward boundary)
and the load-time noise floor already filters axes.
v_hack path now auto-derives from --vhack-pairs-path (out/vhack/v_hack_pairset_
<stem>.safetensors): pass the pairset, the hack file auto-loads/extracts -- no
need to also pass --v-hack-path. run-substrate drops the redundant flag.
smoke: smoke-route2 (act) and new smoke-route2-grad both pass (||B_q||=0.109,
exit 0); erase shared-basis path unchanged (cout->0, fired~0.9).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Adds intervention=route2: a LoRA quarantine (A_q,B_q) with its own basis,
always summed into the forward, plus a per-sample activation-cosine mask that
detaches the kept adapter for flagged samples. Routing happens in the forward,
not via grad surgery: a flagged sample updates only the quarantine; an unflagged
hack-like sample concentrates there by gradient magnitude (absorption). Deploy
zeroes A_q,B_q. v_act built by extract_v_act (forward-only activation mean-diff
over persona pairs). Fixes the per-prompt zero_grad wiping quarantine grads
before opt.step. scripts/make_random_vhack.py = the random-V route control.
vhack_refresh_every default 0->5 (0 is ablation-only).
Smoke: R1 grad check passes (flagged->delta_S grad 0, A_q/B_q>0; forward value
unchanged); smoke-route2 ||B_q||=0.109, deploy eval + asserts pass.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- plot_substrate.main now also calls plot_dynamics.plot/plot_hack_overlay so one
command produces all 4 figs (by_method, by_hack, aggregate, hack_overlay); the
aggregate 'total hacks per arm' core plot is kept, not reimplemented.
- plot_dynamics: point parser at CURRENT streaming headers (cin_t/cin_s, hk_dep/
slv_dep); it was built for the old cos_pre_t/hack_deploy spelling and silently
failed on sub4 logs. No backward-compat for the superseded header.
- justfile: 'plot GLOB STEM' canonical entrypoint over logs/*_sub4_*.log.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
prog_wide pairset cut hack the most (-0.226, no pass cost) in the pairset
comparison (results.md), so it's the default v_hack source for the
erase/route arms; vanilla ignores it. REFRESH defaults to 5.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
intervention=none is a pure GRPO baseline: skip v_hack load/extract entirely
(v_hack=None), emit a nan diag, and the cin/cout/fired columns are already
hidden on the vanilla arm (#141). A --v-hack-path passed to vanilla is logged
and ignored. Removes the misleading cos_pre baseline and the ~5-min auto-extract
a vanilla run would otherwise trigger on a cache miss.
run-substrate recipe: drop the MIX override (inherit locked 0.125) and the
--v-hack-path (vanilla needs none); erase/route substrate runs pass it explicitly.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>