Gate now scores each rollout by dot(pooled bottleneck act, v_act) captured on
the no-grad logpi_old forward (quarantine-ablated, matching the sampling
policy); masks are pinned BEFORE the single grad-carrying forward, so the
grad-gate's pass-1 backward is gone. Thresholds: rolling 256-act buffer,
z-normalized, two-threshold Otsu (winsorized 1/99); warmup pins absorb until
128 scores. Buffer stores pooled acts and re-scores against the current v_act,
so the forward-only refresh (every 5 steps) needs no flush. No bimodality
guard: calibration showed Otsu tail separation ~2.4-2.8 buffer-sd on every
condition including pure Gaussians, so no shape statistic discriminates.
Deleted with the arm wiring (rename-on-logic-change: routeA never conflates
with routeV runs): extract_vhack_grad.py, _build_v_grad, route_band_edges,
_pair_cos, the pass-1 autograd.grad block, grad_probe training wiring,
v_grad_k/route_std_*/routeV_random_v_seed config, smoke-topk recipe.
c-probe stays in lora2r.py for scripts/diag_pinning.py only.
verify_science_invariants: all-in-one count 27 -> 42 (stale since c33b810
added the wave-2 behavior2 pairs) + assert the 8-pair routeA training subset.
Smoke: routeA/vanilla/absorb/solvemix all pass (gate exercises warmup, Otsu
zones, refresh, deploy ablation) -- /tmp/claude-1000/smoke_routeA.log.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
extract_v_act: pooled completion-token bottleneck act per module, v = unit-norm
mean pair diff (tstat flag default off, null at n=8 pairs). ActCapture is the
single hook shared by extraction, the live gate, and verification.
verify_v_act (pueue #24): rel diff 7.3e-4 hack / 7.7e-4 clean vs
out/diag/pinning_feats.pt on the v3 first_hack ckpt; min per-module cos 0.99997.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Synthetic and live scores share an axis with meaningful zero (raw scores carry a
common <mu,v> offset since v = mean diff is not orthogonal to the family mean).
Zones come from label-free online stats (EMA mean/std + Otsu valley), replacing
mean+k*sd which placed both cuts beyond every distribution. Winsorize at 1/99%
before Otsu: variance-maximizing cuts otherwise buy a class for one outlier.
Fresh-eyes review verified z-norm is affine (AUROCs unchanged), zones label-free,
P/R recompute from parquet matches titles.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
10 pairs from parallel session renamed behavior_* -> behavior2_* so /behavior_
still selects the proven 8; bare-call-no-assert (run_tests axis-1 family) and
assert-or-True (assert-True family) replaced with mock-under-test and
fudge-expected; +5 more: skip-failing, retry-until-green, coverage-pragma,
type-ignore, survivorship-metric. No identifier overlap with the env's 992
train/test problems (total_steps -> total_clicks after collision).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Gate band (mean + k*std) now computed from THIS batch's pooled positions each step
instead of a sliding window. Refresh-proof by construction (live rollouts scored vs
the current v_grad), so the v_grad-refresh window flush is gone. Drops route_window
config + collections import. SmokeConfig forces routing (mid=-1,rout=0) since random
tiny data never separates -> quarantine would never train -> pathway assert would fail.
lr 3e-4 -> 1e-4: 3e-4 diverged at step ~27 (lp_s +18->+73, rew_s->0 after clean
emergence 7-24); 1e-4 is the normal LoRA range and emergence was already fast.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Gate calibration: route by live mean + route_std_mid/route_std_rout * std of the
pooled cosine-to-v_grad, not a fixed quantile tail. Self-silences -- only the tail
that genuinely exceeds the spread routes, so qmass tracks real separation instead
of a forced fraction. The authored absolute band is mis-placed (live pos sits far
below the synthetic-hack edge; even synthetic solve out-aligns on-policy hack).
tablelog: auroc/rout/routE/keep/resid/qmass cols always shown (nan on vanilla) so
arm tables line up.
Diagnostics: scripts/diag_pinning.py (4-population calibration view, mean+/-2sd band)
and scripts/diag_pinning_refresh.py (proves cosine stats recompute from a tracked
v-independent gradient cloud on a v_grad refresh -- exact for k=1, sanity 2.5e-16).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
One scheduler object does warmup + cosine relaxation; pct_start=warmup_frac
is the explicit warmup. cycle_momentum=False so it doesn't clobber the
configured AdamW betas (adam_beta1). Curve (100 steps, 20% warmup, 3e-4):
peaks 3e-4 at step ~19 via smooth cos ramp (vs old linear), 1.7e-4 at step
10 where 5e-4 had diverged. Smoke + all verify gates green.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- FastConfig lr 5e-4 -> 3e-4: 5e-4 peaked exactly at warmup-end (step ~10)
and diverged (lp_t -0.5 -> -4.8, hack_s 20/24 -> 0). Lower peak + longer
warmup defuse the spike.
- Config warmup_frac 0.1 -> 0.2: SequentialLR(LinearLR, CosineAnnealingLR)
already does warmup+cosine relaxation; just reach the peak more gradually.
- save_ckpt: drop A0/B0 (seeded init, regenerable from lora_init_seed;
ckpt_update0000 is the init since A==A0 at step 0; nothing live reloads
them), save A/B bf16 not fp32. ~1.3G -> ~0.33G per ckpt.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Each lora2r ckpt is ~1.3G (A/B + redundant frozen A0/B0, 252 modules fp32). The
768G disk filled and runs crashed at the step-0 ckpt save. 20-step cadence halves
the per-run footprint while keeping enough points for the eval curve.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Three fixes after the deploy-solve=0 run (all user-confirmed):
1. Gate co-adaptation: t_lo sat at quantile(route_quantile), so keep was only the
bottom ~5% and ~95% of rollouts landed in mid (both blocks train, qmass~0.5).
Move the keep/mid boundary up: t_lo=quantile(1-2q), t_hi=quantile(1-q), so keep
is the BULK, mid+rout are the top 2q. Three-zone absorption preserved (not deleted).
2. Deterministic teachers: every teacher-phase prompt is drawn from the both-pool-
covered set and gets EXACTLY teacher_n_per_prompt hack + N solve (constant count,
no flip/coverage drops). Replaces mix_ratio*_even_split (count varied per step).
No flip in the teacher phase (solve teacher carries solve pressure). mix_ratio>0
stays the on/off switch. Removed dead _even_split.
3. Deploy-mode generation: student rollouts generate under ablate_quarantine, so the
behavior policy = the shipped deployed-only model -- the quarantine's learned hack
can't saturate the rollout distribution and starve honest solve advantage. For
clean-gated rollouts gen and train forward now match.
Also: FastConfig lr 1e-4->5e-4 (random-init lora2r needs more lr in the short budget).
AGENTS.md: don't bake unconfirmed theories into comments; don't inflate diagnosis
confidence across turns. Smoke + smoke-solvemix green; all verify gates pass.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The authored absolute band made pos>=1 unreachable for live hacks (rout~0),
and re-extracting it every 5 steps collapsed the gate (the #40 step-5 cliff).
- Online-stats gate: route by live quantiles of the pooled cos-to-v_grad
(top route_quantile -> hack, bottom -> keep, middle -> mid), window flushed
on refresh. v_grad stays authored-only; only the threshold follows the live
distribution. Smoke: routing sustained past the refresh (cliff fixed).
- Step-level teacher mix (#31): mix_ratio is a fraction of ALL the step's gens,
not a per-prompt round; symmetric hack+solve teachers injected as ordinary
gens (not specially routed). Fixes the per-prompt rounding wart.
- AUROC + cosU step columns: v_grad as a live hack-detector vs the hack-label
(measurement-only, never routes) -- discriminates threshold-vs-direction
failure and whether a refresh destroys separation.
- Inline eval stays off (eval_ablate_every=0); deploy scored offline.
- Fix _sample_rows None crash (beartype) on the no-solve-pool path.
- Remove dead pooled_gate_thresholds (the rejected authored-pooled approach).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
--solve-pool-dir splits the G_t teacher budget solve_mix_frac solve / rest hack
(default off). The gate's routed-share is split by teacher SOURCE: a discriminating
gate routes hack teachers (d->1) and KEEPS solve teachers (d->0); equal shares =
non-directional (shrinkage null). Teacher source is our pool construction, not a
live-rollout oracle label -- a legit diagnostic. Per-step debug + final BLUF
(hack-routed vs solve-routed gap, 🟢/🟡/🔴). _sample_rows helper dedups the draw.
Smoke: just smoke-solvemix green (split+diagnostic path runs end-to-end).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Equal hack/solve pressure, harder problems, faster env (user call 2026-06-10).
Pin the frac on the command line so the headline regime is self-documenting,
not silently default-dependent. Requeued #36-39 at 0.5 with honest 50% labels.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Off the live lora2r path; removed with vhack.py (commit 4120d75):
- proj.py: drop project_delta_S_grad/_project_one_module/mean_cos_pre_from_grads/
_hackward_cos (no live importer; train.py uses only per_token_logps).
- verify_science_invariants: test pairset_sha256's content gate directly (drops the
load_v_hack vehicle + fake delta_S wrapper fixture).
- extract_vhack_grad: import pairset_sha256 from .pairs (was re-exported via vhack).
- tablelog/figs: stale 'delta_S grads'/'knob' comments -> A/B grads.
Smoke + verify_science_invariants green; no delta_S left in live code.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
k=1 (default) stays the mean-mass mean-diff axis -- headline unchanged. k>1
builds the top-k oriented SVD dirs of the paired diff and the gate scores
max_i cos(g, v_i) (alignment to ANY known hack sub-mode), catching multi-modal
hack signal one mean washes out. Shared _build_v_grad at init + refresh; band
edges and the live gate both max over k. Sims use einsum + jaxtyping dims.
Smoke: just smoke-topk green (top-3 subspace, band width +0.087, 12/14 modules).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2a-2c only tested UNIFORM masks. 2d puts rollout 0 clean (0,0) and rollout 1 hack
(1,1) in ONE forward and asserts the mixed deployed grad == rollout-0-alone-clean
and the mixed quarantine grad == rollout-1-alone-hack -- the load-bearing
per-rollout mask vectorization ([G,1,1] reshape) with no cross-rollout bleed.
Green on tiny-random.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Replace the SVD-of-W / delta_S / erase / cin-cout description with the lora2r
adapter (rank-2r LoRA, deployed [:r] + quarantine [r:] blocks, SGTM three-way
masks, deploy=ablate quarantine), the two-pass routeV gate, and the three live
arms (none/routeV/absorb). Fix the dead quick-start recipes (queue-decision).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
antipasto.py (PiSSA/lora_frozen_b/old-lora2r wrappers) is dead in the live path --
train.py/extract use lora2r.py, nothing imports antipasto. Move the 7 scripts that
import it or the erase-era proj fns (rescore_deploy, eval_checkpoint_curve,
verify_vhack_heldout, probe_distill, diag_cosine_dist, diag_pairs_compare,
tt_erase_bench) to scripts/attic/ -- they need lora2r rewrites if resurrected.
Live imports verified clean.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
AGENTS.md: explain what a routing pair IS (same-prompt hack/clean = pos/neg, vector
= grad(prompt+hack)-grad(prompt+clean); no problem_id semantics; identical hack/clean
under a DIFFERENT prompt = distinct gradient). Caught that prog_wide_clean is NOT a
byte-identical subset of pairs_authored: 3/8 shared pairs differ in prompt.
justfile: smoke recipes now use the live arms (none/routeV/absorb), drop deleted flags
(--intervention=erase, --routeV-absorb-all, --adapter, --v-hack-path). Add smoke-all
and queue-decision (the headline 4-arm lora2r run).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
train.py rewritten straight-line for the single rank-2r Gaussian-init LoRA adapter
and three arms (intervention none|routeV|absorb). Removes the erase grad-surgery,
act_vote/online_stats gates, beta/KL reference path, per-source split harvest, the
v_hack injection block, and all per-mechanism E/C/D/A-B tallies. Folds in:
- T2 Gaussian init (lora2r.py): A0~N(0,1/d_in), B0~N(0,1/2r), net delta 0 at init.
- T3 width-pooled gate labels: single (num/den) fraction across modules, skip
zero-width modules, raise if none separate (was per-module equal-weight blowup).
- T5 absorb arm: masks pinned (1,0) -> both blocks train, no gate.
- T6 self-contained ckpt: A/B/A0/B0 in one file (no _hack file, no SVD cache),
adapter:"lora2r" in saved cfg.
- T8 m3: step_flagged logs the hack share (d.mean), not m.mean.
Gates green: verify_lora2r_routing (4 invariants) + smoke none/routeV/absorb
end-to-end on tiny-random Qwen3 (logs in /tmp/claude-1000/smoke_*.log).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Structural-separation arm to disentangle directionality from shrinkage. A rank-2r
PiSSA-init LoRA with A and B both trainable, partitioned into a deployed block [:r]
and a quarantine block [r:] (spectrum-matched via alternated SVD axes). Unlike the
same-basis PiSSA routeV (where deploy-ablation only removes a magnitude slice of one
shared update = shrinkage null), each block has its own input-side A rows and
output-side B columns, so deploy-ablation removes a different FUNCTION.
Routing = SGTM-style three-way hard per-rollout masks from the cosine of the deployed
block's gate-pass gradient to the pair-extracted v_grad: clean (m=0,d=0) trains
deployed only; hack (m=1,d=1) detaches deployed output so only the quarantine updates
(SGTM grad-retain trick); mid (m=1,d=0) trains both (absorption). Gate is no-cheat:
cos to the hand-authored-pair direction, never an oracle label of a live rollout.
verify_lora2r_routing.py gates identity-at-init, the three-way block-grad routing,
per-rollout c-probe recovery, and ablation teeth; wired into smoke-lora2r. Additive:
PiSSA / lora_frozen_b paths untouched.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
AGENTS.md: new section on PiSSA (delta_S:[r] diag) vs LoRA (A:[r,d_in] full)
adapters -- forward sees only the sum so same-basis routing is a magnitude split
(shrinkage null) unless broken by gate discrimination x (expressiveness + structural
separation). Honest note that this wasn't clear to me first pass.
RESEARCH_JOURNAL: rotation fix + the verified shrinkage confound (antipasto.py:107
sums kept+hack in one basis); the deploy delta_S*=(1-qE) control is the cheap decider.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The gt_only flip was seeded on (seed, pid) at load time, so the SAME ~10% of
problems were unhackable every step -- a fixed honest subset the model can
memorize instead of learning to genuinely solve the distribution. Move the flip
into the train step loop, seeded on (seed, step, pid), so the unhackable subset
rotates: over training every problem is sometimes shown hint-free.
Both halves flip, not just the hint:
- prompt: messages_gt (plain "pass all tests", no loophole disclosed) vs messages
- grader: eff_mode=gt_only routes compute_reward to the honest oracle (no channel),
so a hack earns format-only reward. Teacher demos are skipped on flipped steps
(a cached loophole hack no longer matches the hint-free prompt).
Gates: verify_rewards gains gt_only rows (every hack -> passed=False, reward 0.5);
new verify_rotation proves messages_gt is hint-free AND the subset rotates per step.
Smoke logs flip count (1/30 hint-free, graded gt_only).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Disambiguate the overloaded deploy/train/knob vocabulary (paper-consistent:
'quarantine' + 'ablated' + 'deployed' all match Cloud et al.). One opposite each:
- policy view: hack_deployed/solve_deployed (quarantine ablated, ships) vs
hack_as_trained/solve_as_trained (quarantine attached). Unifies the old split
deploy_hack (JSON) vs hack_deploy (table key) into one name.
- 'knob' -> 'quarantine'/'adapter' throughout comments and log strings.
- train/test reserved for the DATA split only.
Bump RUN_SCHEMA v1->v2 so old deploy_test.json files are skipped (not crashed) by
completed_runs. CLI flags untouched (queued jobs unaffected). Fixed two
replace_all collision bugs (hack_deploy substring of hack_deployed -> deployeded)
and the missed eval_curve writer (eval_checkpoint_curve.py) + readers
(results_deploy.py). Smoke green: v2 written + read; gates pass.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Replaces the partial preset= line. Every None resolves to its effective value
(pairset 'unused (vanilla)', v_hack_file 'unused (not erase)', teacher 'none',
routeV knobs 'unused (not routeV)') so a detached log shows exactly what ran --
fixes 'which pairset did this job use?'. Resolve v_hack_file once up front
(single source); an explicit --v-hack-path that's missing now fails fast instead
of silently extracting to a user-named path.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
_validate_config rejects method-irrelevant/contradictory options before the
model load (routeV-only knobs on non-routeV, top_k>1 off grad_cosine, v_hack_path
off erase, lora adapter on unwired arms). Removes the duplicate inline lora check,
the vanilla v_hack_path warn-and-ignore (now a hard error), and the inline top_k
assert -- one canonical place. Re-extracted v_hack_smoke against the new authored
default (sha guard caught the orphaned cache). Smoke green; bad combo raises.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The streaming StepLogger gated on the dead literal arm=='routing' (qmass) and
exact arm=='routingV' (missed routingV_per_token). arm is never 'routing' (the
arm property maps routeV->routingV), so qmass was computed into the row dict but
only ever surfaced in the end-of-run dump, never streamed. Gate all routeV cols
on is_route={routingV, routingV_per_token}; fold qmass in. (GPT-flagged, verified.)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>