Commit Graph

18 Commits

Author SHA1 Message Date
wassname 97aede8d9c fix(routeV): keep=bulk gate + deterministic teachers + deploy-mode generation
Three fixes after the deploy-solve=0 run (all user-confirmed):

1. Gate co-adaptation: t_lo sat at quantile(route_quantile), so keep was only the
   bottom ~5% and ~95% of rollouts landed in mid (both blocks train, qmass~0.5).
   Move the keep/mid boundary up: t_lo=quantile(1-2q), t_hi=quantile(1-q), so keep
   is the BULK, mid+rout are the top 2q. Three-zone absorption preserved (not deleted).

2. Deterministic teachers: every teacher-phase prompt is drawn from the both-pool-
   covered set and gets EXACTLY teacher_n_per_prompt hack + N solve (constant count,
   no flip/coverage drops). Replaces mix_ratio*_even_split (count varied per step).
   No flip in the teacher phase (solve teacher carries solve pressure). mix_ratio>0
   stays the on/off switch. Removed dead _even_split.

3. Deploy-mode generation: student rollouts generate under ablate_quarantine, so the
   behavior policy = the shipped deployed-only model -- the quarantine's learned hack
   can't saturate the rollout distribution and starve honest solve advantage. For
   clean-gated rollouts gen and train forward now match.

Also: FastConfig lr 1e-4->5e-4 (random-init lora2r needs more lr in the short budget).
AGENTS.md: don't bake unconfirmed theories into comments; don't inflate diagnosis
confidence across turns. Smoke + smoke-solvemix green; all verify gates pass.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-11 00:29:12 +00:00
wassname bf616749ee Consolidate tagged hack pairsets in data 2026-06-10 11:58:53 +00:00
wassname 5714996c56 docs+justfile: pairs concept note (AGENTS.md) + lora2r smoke/decision recipes
AGENTS.md: explain what a routing pair IS (same-prompt hack/clean = pos/neg, vector
= grad(prompt+hack)-grad(prompt+clean); no problem_id semantics; identical hack/clean
under a DIFFERENT prompt = distinct gradient). Caught that prog_wide_clean is NOT a
byte-identical subset of pairs_authored: 3/8 shared pairs differ in prompt.

justfile: smoke recipes now use the live arms (none/routeV/absorb), drop deleted flags
(--intervention=erase, --routeV-absorb-all, --adapter, --v-hack-path). Add smoke-all
and queue-decision (the headline 4-arm lora2r run).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-10 11:08:59 +00:00
wassname 7511ba12e8 docs: record adapter shapes + shrinkage-vs-separation; journal rotation fix
AGENTS.md: new section on PiSSA (delta_S:[r] diag) vs LoRA (A:[r,d_in] full)
adapters -- forward sees only the sum so same-basis routing is a magnitude split
(shrinkage null) unless broken by gate discrimination x (expressiveness + structural
separation). Honest note that this wasn't clear to me first pass.

RESEARCH_JOURNAL: rotation fix + the verified shrinkage confound (antipasto.py:107
sums kept+hack in one basis); the deploy delta_S*=(1-qE) control is the cheap decider.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-10 06:50:42 +00:00
wassname b36e3db255 docs: tone down the START HERE links to plain pointers
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-10 03:36:52 +00:00
wassname 0d6ff754ec docs: AGENTS.md START HERE links (human_journal, main.tex, grad-routing paper); revert rescore fallback
- Point future agents at the three docs that pin the actual thesis + the
  live open question (direction vs routing vs SVD/PiSSA prior), so they don't
  re-derive the non-directional result as a 'bug'.
- Revert rescore_deploy cfg.get() fallback to cfg[key] (fail-fast; old-schema
  checkpoints crash loudly rather than silently defaulting).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-10 03:34:06 +00:00
wassname 3b38a05738 no-cheat framing: label-leakage not detector-presence; fix plot comment
The disqualifier for an intervention is needing the env oracle / ground-truth
hack-labels of the live training distribution, not 'a detector ran'. On a new
RL env there is no oracle, so GT-monitor and the (oracle-label-trained) probe
can't be built there; a generic LLM judge and our hand-authored-pair vector can.
LLM judge is thus the fair external peer (no clean fast-env number to plot).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-09 11:22:29 +00:00
wassname 9c630b83c7 agents: no-cheat #4 (on-distribution pairs = labeling live rollouts = cheating); journal ideal-ceiling tables
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-08 11:39:27 +00:00
wassname 5fd980244b docs: note SGTM is the latest gradient-routing paper (same authors)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 11:56:58 +00:00
wassname 637f9388c8 docs: cite SGTM paper in AGENTS.md (absorption/leakage vocab source)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 11:40:40 +00:00
wassname 52619519dc docs: drop dead refs (spec.md link, verify_gate_anchor.py paragraph)
- spec.md never existed at root or docs/; removed the link from AGENTS.md +
  README.md (the live plan is in docs/spec/ dated files).
- RESEARCH_JOURNAL.md link pointed at docs/; it lives at repo root. Fixed.
- Trimmed the no-cheat-leak paragraph citing scripts/verify_gate_anchor.py
  (that file doesn't exist); kept the general 'gate every load-bearing
  invariant in the same commit' rule.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 11:01:31 +00:00
wassname 83cae4ef72 docs: reframe no-cheat in VECTOR terms; move it README->AGENTS.md
The 'weak detector for hack A, generalize to B' framing was wrong for this repo.
That is the weak-LABEL setup (labelA -> labelNotA), which is NOT ours. Ours is
vec -> routing: vec extracted from hand-built synthetic pairs, route the live
GRPO gradient by cosine alignment to vec; no detector ever runs over student
rollouts at train time. Generalization = does vec (from pairs covering some
modes) suppress held-out modes -- vector generalization, not detector-label.

- AGENTS.md: rewrote the no-cheat bullet to the 3-way distinction (oracle grader
  = cheat; weak-label setup = not ours; vec->routing = ours). For coding agents.
- README: removed the 'We cannot cheat' section (belongs in agent instructions,
  not the new-reader overview).
- spec: dropped the stray 'validation uses known-A detector' line; pointed the
  no-cheat reference at AGENTS.md.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-06 02:39:48 +00:00
wassname f0cbbacaf0 save per-eval deploy-adapter ckpts (rescore w/o retrain) + CLAUDE.md test lesson
save_eval_ckpts (default on): write the deploy adapter (δS only, ~2.3MB) at each
deploy-eval step, step-tagged, so a run can be re-scored later (more prompts /
different eval) without retraining. The A5 run saved only final+first_hack, which
is why the leak needed a full retrain rather than a rescore.

AGENTS.md: every load-bearing invariant gets a verify_*.py gate. The no-cheat leak
shipped because the green gates never covered the property -- 'tests passed' is
meaningless if the property was never tested.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-05 03:58:26 +00:00
wassname efdf86a0cb wip 2026-05-30 04:33:33 +00:00
wassname c1f8ca4e7b tidy 2026-05-29 06:29:43 +00:00
wassname f27c658ca9 docs 2026-05-29 05:42:28 +00:00
wassname 646edfc7af purge dead modules and stale recipes
Deletes 7 source files that were superseded but never removed:
  run.py, grad_proj.py, extract_vhack.py (older twin-NLL extractor),
  grpo_smoke.py, grpo_proj_smoke.py (smoke harnesses replaced by
  train.py "smoke" subcommand), phase2_analyze.py (pilot is past),
  probe_uat.py (UAT pipeline is past).

Drops matching justfile recipes (vhack-check, phase2-analyze,
probe-uat) and the BASE constant that pointed at run.py. Updates
AGENTS/README references to the stale fast-dev-run recipe (now
just smoke / smoke-vanilla).

Verified by running just smoke-vanilla --steps=2 end-to-end.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-28 08:42:15 +00:00
wassname 120400c5f5 setup 2026-05-23 10:40:02 +08:00