Three fixes after the deploy-solve=0 run (all user-confirmed):
1. Gate co-adaptation: t_lo sat at quantile(route_quantile), so keep was only the
bottom ~5% and ~95% of rollouts landed in mid (both blocks train, qmass~0.5).
Move the keep/mid boundary up: t_lo=quantile(1-2q), t_hi=quantile(1-q), so keep
is the BULK, mid+rout are the top 2q. Three-zone absorption preserved (not deleted).
2. Deterministic teachers: every teacher-phase prompt is drawn from the both-pool-
covered set and gets EXACTLY teacher_n_per_prompt hack + N solve (constant count,
no flip/coverage drops). Replaces mix_ratio*_even_split (count varied per step).
No flip in the teacher phase (solve teacher carries solve pressure). mix_ratio>0
stays the on/off switch. Removed dead _even_split.
3. Deploy-mode generation: student rollouts generate under ablate_quarantine, so the
behavior policy = the shipped deployed-only model -- the quarantine's learned hack
can't saturate the rollout distribution and starve honest solve advantage. For
clean-gated rollouts gen and train forward now match.
Also: FastConfig lr 1e-4->5e-4 (random-init lora2r needs more lr in the short budget).
AGENTS.md: don't bake unconfirmed theories into comments; don't inflate diagnosis
confidence across turns. Smoke + smoke-solvemix green; all verify gates pass.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
AGENTS.md: explain what a routing pair IS (same-prompt hack/clean = pos/neg, vector
= grad(prompt+hack)-grad(prompt+clean); no problem_id semantics; identical hack/clean
under a DIFFERENT prompt = distinct gradient). Caught that prog_wide_clean is NOT a
byte-identical subset of pairs_authored: 3/8 shared pairs differ in prompt.
justfile: smoke recipes now use the live arms (none/routeV/absorb), drop deleted flags
(--intervention=erase, --routeV-absorb-all, --adapter, --v-hack-path). Add smoke-all
and queue-decision (the headline 4-arm lora2r run).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
AGENTS.md: new section on PiSSA (delta_S:[r] diag) vs LoRA (A:[r,d_in] full)
adapters -- forward sees only the sum so same-basis routing is a magnitude split
(shrinkage null) unless broken by gate discrimination x (expressiveness + structural
separation). Honest note that this wasn't clear to me first pass.
RESEARCH_JOURNAL: rotation fix + the verified shrinkage confound (antipasto.py:107
sums kept+hack in one basis); the deploy delta_S*=(1-qE) control is the cheap decider.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- Point future agents at the three docs that pin the actual thesis + the
live open question (direction vs routing vs SVD/PiSSA prior), so they don't
re-derive the non-directional result as a 'bug'.
- Revert rescore_deploy cfg.get() fallback to cfg[key] (fail-fast; old-schema
checkpoints crash loudly rather than silently defaulting).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The disqualifier for an intervention is needing the env oracle / ground-truth
hack-labels of the live training distribution, not 'a detector ran'. On a new
RL env there is no oracle, so GT-monitor and the (oracle-label-trained) probe
can't be built there; a generic LLM judge and our hand-authored-pair vector can.
LLM judge is thus the fair external peer (no clean fast-env number to plot).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- spec.md never existed at root or docs/; removed the link from AGENTS.md +
README.md (the live plan is in docs/spec/ dated files).
- RESEARCH_JOURNAL.md link pointed at docs/; it lives at repo root. Fixed.
- Trimmed the no-cheat-leak paragraph citing scripts/verify_gate_anchor.py
(that file doesn't exist); kept the general 'gate every load-bearing
invariant in the same commit' rule.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The 'weak detector for hack A, generalize to B' framing was wrong for this repo.
That is the weak-LABEL setup (labelA -> labelNotA), which is NOT ours. Ours is
vec -> routing: vec extracted from hand-built synthetic pairs, route the live
GRPO gradient by cosine alignment to vec; no detector ever runs over student
rollouts at train time. Generalization = does vec (from pairs covering some
modes) suppress held-out modes -- vector generalization, not detector-label.
- AGENTS.md: rewrote the no-cheat bullet to the 3-way distinction (oracle grader
= cheat; weak-label setup = not ours; vec->routing = ours). For coding agents.
- README: removed the 'We cannot cheat' section (belongs in agent instructions,
not the new-reader overview).
- spec: dropped the stray 'validation uses known-A detector' line; pointed the
no-cheat reference at AGENTS.md.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
save_eval_ckpts (default on): write the deploy adapter (δS only, ~2.3MB) at each
deploy-eval step, step-tagged, so a run can be re-scored later (more prompts /
different eval) without retraining. The A5 run saved only final+first_hack, which
is why the leak needed a full retrain rather than a rescore.
AGENTS.md: every load-bearing invariant gets a verify_*.py gate. The no-cheat leak
shipped because the green gates never covered the property -- 'tests passed' is
meaningless if the property was never tested.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Deletes 7 source files that were superseded but never removed:
run.py, grad_proj.py, extract_vhack.py (older twin-NLL extractor),
grpo_smoke.py, grpo_proj_smoke.py (smoke harnesses replaced by
train.py "smoke" subcommand), phase2_analyze.py (pilot is past),
probe_uat.py (UAT pipeline is past).
Drops matching justfile recipes (vhack-check, phase2-analyze,
probe-uat) and the BASE constant that pointed at run.py. Updates
AGENTS/README references to the stale fast-dev-run recipe (now
just smoke / smoke-vanilla).
Verified by running just smoke-vanilla --steps=2 end-to-end.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>