evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 16:45:42 +08:00

Author	SHA1	Message	Date
wassname	8f39c4a69f	docs: rewrite Evil MoE spec to the soft-routing design + literature evidence Spec was stale (recommended hard sparse "Version A", the DEMix absorption-killer). Rewrite to match what is implemented and what we clarified: - pseudocode-first: lora2r 2-expert forward, seeded rank-1 cosine router, GRPO+pin loop, deploy ablation. For 2 experts the "proper" router IS rank-1 (softmax over 2 = sigmoid of one direction), seeded with v_act. - "Why soft, not top-k" reframed as a tradeoff, not a verdict: hard routing closes the leak but needs a router that catches all hacks; soft keeps absorption available but leaks (1-w). DEMix only bites if we rely on absorption. - Evidence section from two literature searches. Forced localization has working precedents (single bad direction: emergent misalignment/persona/refusal; behavioural expert seeding: SteerMoE, geometric cosine routing, cluster-aware upcycling; ablation + repair: NAEE/MoE-Pruner; router anchor: SEUF/MoTE). Emergent localization does not (standing-committee, topic-driven routing). So seed+pin are load-bearing. - 3-way/3-expert noted as an extension (closer to production), 2 experts for the decisive causal run. README: add Router dynamics (three forces, one pin-vs-reward conflict, mitigations). Add HF "MoE in Transformers" blog to docs/papers (force-added past the docs gitignore). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-14 13:06:38 +08:00
wassname	04a98b321e	feat: Evil MoE — learned soft router + pin loss on an ablatable hack expert Fork of vGROUT. Replaces routeA's fixed v_act quantile gate with a learned per-rollout soft router (HackRouter, seeded from v_act) on the ablatable hack expert: GRPO flows into the router through the soft weight w (it concentrates hack-like rollouts in the hack expert), and a continuous pin loss on the hand-authored pairs anchors the axis. No load balancing; routing is per rollout. lora2r gains a soft-weight forward path (_lora2r_w: w=0 keep, w=1 rout, deployed grad scaled by 1-w). train_moe.py is the on-policy GRPO loop; verify_moe_router.py gates the routing invariants. `just smoke` is green. README/AGENTS rewritten for the fork; original proposal kept as docs/spec/original_evil_moe_spec.md. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-14 11:25:14 +08:00
wassname	cca7150ea0	tidy	2026-06-14 11:05:54 +08:00
wassname	c4ac632b37	docs: add Evil MoE experiment proposal	2026-06-14 09:28:16 +08:00
wassname	41d225a5ec	writeup	2026-06-12 04:46:01 +00:00
wassname	af420ec855	feat: generation-matched logπ_old baseline + global-quantile gate + frac=0 method Fixes the frac=0 PPO-clip blow-up: logπ_old is now the behavior policy computed in each rollout's own sampling mode, so ρ is a true importance ratio. The old always-ablated baseline gave full-sampled route rows ρ=full/ablated, which the one-sided clip can't bound for A<0 (the loss-5e5 divergence). ρ=1 only where the mask's forward mode matches sampling mode; ρ logged per zone (keep/absorb/rout). Note (Fable review): frac=0.5 reintroduces the blow-up on deploy-sampled absorb/route rows by construction -- frac=0 is the clean point. Gate: two-threshold Otsu -> symmetric global-quantile tails (route_tail_q=0.1) over a run-spanning act buffer (8192 > 4800 default rollouts so the early clean era anchors the low tail; buffer stores acts, re-scored vs current v_act so a refresh needs no flush). Removes the per-window z-norm gate-collapse on a saturated all-hack window. gen_deploy_frac knob: frac=0 puts the quarantine ON during sampling so it elicits the hack and absorption can localize it. queue-decision now passes --gen-deploy-frac=0 explicitly on all four arms (base default stays 1.0 = the job-34 config where ablation RAISED hack 0.71->0.86). Docs: AGENTS.md gen/forward/backward + why-frac=0 sections; RESEARCH_JOURNAL 2026-06-12; diag_deploy_ablations.py (quar-only vs deploy localization probe). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-12 03:22:48 +00:00
wassname	ec11bf58b2	docs: update method descriptions for activation routing	2026-06-11 13:22:13 +00:00
wassname	668285c86e	journal: fix pueue IDs 28-30 -> 31-33 in entry (f) provenance Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 13:08:20 +00:00
wassname	8000aa48f4	journal(#41 ): entry (g) routeA shipped + guard-drop calibration; track moduleS diag scripts Entry (f) already cited scripts/diag_pinning_moduleS_exact.py; both moduleS scripts were untracked, so committing them for provenance alongside the calibration script the new entry cites. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 12:50:07 +00:00
wassname	f646e57028	fix(#41 ): routeA gate hardening from fresh-eyes review - assert finite gate scores (one NaN act would poison the rolling buffer -> NaN thresholds -> silent all-keep routing for up to route_buffer rollouts) - d = dm: degenerate Otsu (t_lo==t_hi) could emit (m=0,d=1), a mask state that trains nothing; rout now requires the quarantine on - solve-mix discrimination collects only when solve teachers exist (a hack-only [False]n list tripped the end-of-run line with a nan solve side) - warn when authored-pair AUROC < 0.9 (extraction sanity; tiny-random smoke sits ~0.78 so warning, not assert) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 12:48:11 +00:00
wassname	adca442253	feat(#41 ): routeA activation gate replaces routeV grad gate Gate now scores each rollout by dot(pooled bottleneck act, v_act) captured on the no-grad logpi_old forward (quarantine-ablated, matching the sampling policy); masks are pinned BEFORE the single grad-carrying forward, so the grad-gate's pass-1 backward is gone. Thresholds: rolling 256-act buffer, z-normalized, two-threshold Otsu (winsorized 1/99); warmup pins absorb until 128 scores. Buffer stores pooled acts and re-scores against the current v_act, so the forward-only refresh (every 5 steps) needs no flush. No bimodality guard: calibration showed Otsu tail separation ~2.4-2.8 buffer-sd on every condition including pure Gaussians, so no shape statistic discriminates. Deleted with the arm wiring (rename-on-logic-change: routeA never conflates with routeV runs): extract_vhack_grad.py, _build_v_grad, route_band_edges, _pair_cos, the pass-1 autograd.grad block, grad_probe training wiring, v_grad_k/route_std_*/routeV_random_v_seed config, smoke-topk recipe. c-probe stays in lora2r.py for scripts/diag_pinning.py only. verify_science_invariants: all-in-one count 27 -> 42 (stale since `c33b810` added the wave-2 behavior2 pairs) + assert the 8-pair routeA training subset. Smoke: routeA/vanilla/absorb/solvemix all pass (gate exercises warmup, Otsu zones, refresh, deploy ablation) -- /tmp/claude-1000/smoke_routeA.log. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 12:38:19 +00:00
wassname	5a340e5c3e	feat(routeA): forward-only v_act extraction + verify gate vs cached diag features extract_v_act: pooled completion-token bottleneck act per module, v = unit-norm mean pair diff (tstat flag default off, null at n=8 pairs). ActCapture is the single hook shared by extraction, the live gate, and verification. verify_v_act (pueue #24): rel diff 7.3e-4 hack / 7.7e-4 clean vs out/diag/pinning_feats.pt on the v3 first_hack ckpt; min per-module cos 0.99997. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 12:13:32 +00:00
wassname	d51028a618	user	2026-06-11 12:13:13 +00:00
wassname	77fa5bbf6b	spec: routeA plan approved; deletion scope extended to extract_vhack_grad + all grad-gate helpers Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 11:50:20 +00:00
wassname	8b6bf5b6a6	journal(e): super-S-space and act t-stat both null; act_dot mean-extraction stays the routeA default Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 11:44:12 +00:00
wassname	1d4f33ffb6	diag: super-S-space gate score null; spec -> act_dot + winsorized-Otsu plan superS (pooled writer/reader eigenbasis, whitened + top-r) tops out at min-window AUROC 0.721 = raw resid dot; best unwhitened rotation+top-64 0.740 < act 0.747 (max of ~50-variant grid). act t-stat extraction also null (0.719 vs 0.749 min). Spec updated: act_dot default, journal-(d) evidence table, implementation plan for routeA. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 11:42:44 +00:00
wassname	1b51c62cdc	journal(d): z-norm+otsu pinning fix, module weighting null, mode hypothesis falsified Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 11:28:37 +00:00
wassname	e5b68acf69	diag(#40 ): z-norm scores within family, winsorized 2-threshold Otsu zones, --feats offline mode Synthetic and live scores share an axis with meaningful zero (raw scores carry a common <mu,v> offset since v = mean diff is not orthogonal to the family mean). Zones come from label-free online stats (EMA mean/std + Otsu valley), replacing mean+k*sd which placed both cuts beyond every distribution. Winsorize at 1/99% before Otsu: variance-maximizing cuts otherwise buy a class for one outlier. Fresh-eyes review verified z-norm is affine (AUROCs unchanged), zones label-free, P/R recompute from parquet matches titles. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 11:23:42 +00:00
wassname	270c4f5a27	misc	2026-06-11 11:07:28 +00:00
wassname	7871aa66b8	tidy	2026-06-11 11:07:17 +00:00
wassname	24aea19bee	diag(#40 ): offline follow-up -- pooling variants, synthetic common-mode, env_mode join Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 10:58:16 +00:00
wassname	bbf6a97151	diag(#40 ): fix 3x2 plot -- keep bottom spine at axis edge (tick labels collided with rugs), skip labels on narrow zones Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 10:47:57 +00:00
wassname	4be1858b14	journal(c) addendum: resid_cos 0.92/0.70/0.80 vs act_cos -- bottleneck not the limiter, routeA defaults to resid Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 10:46:06 +00:00
wassname	0660e7bdd3	diag(#40 ): residual-stream rep (layers 12/18/24) as third gate-score candidate, 3x2 panel Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 10:45:50 +00:00
wassname	4a7465c0da	journal(c): act score replicates (0.75-0.87), grad decays to chance across windows; routeA next Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 10:44:40 +00:00
wassname	49ca51b6e1	diag(#40 ): review fixes -- A>0 contrast headline (adv-only baseline 0.90 vs-all / 0.58 A>0), headline=behavior_ training default, n at P@rout, rug strips, drop density ticks Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 10:22:08 +00:00
wassname	4644af155a	docs: vocabulary section -- routing, vector, pinning, absorption Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 10:13:06 +00:00
wassname	2f5f93f4fb	diag(#40 ): Q2 rewrite -- {grad,act}x{cos,dot} scores vs exploited&A>0 label, 3-zone pinning, pairset table Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 10:13:06 +00:00
wassname	c33b810faa	data: wave-2 behavior2_* pairs -- 15 new proxy-gaming mechanisms, 2 contaminated drafts replaced 10 pairs from parallel session renamed behavior_* -> behavior2_* so /behavior_ still selects the proven 8; bare-call-no-assert (run_tests axis-1 family) and assert-or-True (assert-True family) replaced with mock-under-test and fudge-expected; +5 more: skip-failing, retry-until-green, coverage-pragma, type-ignore, survivorship-metric. No identifier overlap with the env's 992 train/test problems (total_steps -> total_clicks after collision). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 10:04:23 +00:00
wassname	70697ff36e	diag(#40 ): pinning plot splits solve/fail/hack + per-pairset AUROC ranking Q4 fix: on-policy "solve" was ~exploited = solves+fails (mostly fails). Split by gt_pass into solve/fail/hack (live: 103 hack / 27 solve / 62 fail). Per-pairset ranking: build v_grad from each heading-prefix subset, re-project the SAME stored live c-grads (no model re-run). Finding: behavior pairs AUROC 0.69 vs all-in-one 0.53; reasoning/opportunity anti-aligned (<0.5) -> mixing dilutes. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 06:16:27 +00:00
wassname	19687087b0	feat(#30,#39): simple online gate -- band from current batch, no window/cloud; lr 1e-4 Gate band (mean + k*std) now computed from THIS batch's pooled positions each step instead of a sliding window. Refresh-proof by construction (live rollouts scored vs the current v_grad), so the v_grad-refresh window flush is gone. Drops route_window config + collections import. SmokeConfig forces routing (mid=-1,rout=0) since random tiny data never separates -> quarantine would never train -> pathway assert would fail. lr 3e-4 -> 1e-4: 3e-4 diverged at step ~27 (lp_s +18->+73, rew_s->0 after clean emergence 7-24); 1e-4 is the normal LoRA range and emergence was already fast. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 06:04:28 +00:00
wassname	979daf84fd	feat(#30 ): mean+kstd online gate replaces fixed quantile; always-show route cols Gate calibration: route by live mean + route_std_mid/route_std_rout std of the pooled cosine-to-v_grad, not a fixed quantile tail. Self-silences -- only the tail that genuinely exceeds the spread routes, so qmass tracks real separation instead of a forced fraction. The authored absolute band is mis-placed (live pos sits far below the synthetic-hack edge; even synthetic solve out-aligns on-policy hack). tablelog: auroc/rout/routE/keep/resid/qmass cols always shown (nan on vanilla) so arm tables line up. Diagnostics: scripts/diag_pinning.py (4-population calibration view, mean+/-2sd band) and scripts/diag_pinning_refresh.py (proves cosine stats recompute from a tracked v-independent gradient cloud on a v_grad refresh -- exact for k=1, sanity 2.5e-16). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 02:56:07 +00:00
wassname	4f60f94072	spec: small-reward-hacking env spinout (parked post-paper; commit archaeology for the 6->4 mode selection) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 02:07:55 +00:00
wassname	154a37441b	refactor: OneCycleLR replaces SequentialLR(LinearLR, CosineAnnealingLR) One scheduler object does warmup + cosine relaxation; pct_start=warmup_frac is the explicit warmup. cycle_momentum=False so it doesn't clobber the configured AdamW betas (adam_beta1). Curve (100 steps, 20% warmup, 3e-4): peaks 3e-4 at step ~19 via smooth cos ramp (vs old linear), 1.7e-4 at step 10 where 5e-4 had diverged. Smoke + all verify gates green. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 01:52:30 +00:00
wassname	a72835315c	fix: lr 3e-4 + 20% warmup (5e-4 diverged at warmup-end); slim bf16 ckpts - FastConfig lr 5e-4 -> 3e-4: 5e-4 peaked exactly at warmup-end (step ~10) and diverged (lp_t -0.5 -> -4.8, hack_s 20/24 -> 0). Lower peak + longer warmup defuse the spike. - Config warmup_frac 0.1 -> 0.2: SequentialLR(LinearLR, CosineAnnealingLR) already does warmup+cosine relaxation; just reach the peak more gradually. - save_ckpt: drop A0/B0 (seeded init, regenerable from lora_init_seed; ckpt_update0000 is the init since A==A0 at step 0; nothing live reloads them), save A/B bf16 not fp32. ~1.3G -> ~0.33G per ckpt. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 01:42:20 +00:00
wassname	f1dd9fb33e	chore: FastConfig save_ckpt_every 10->20 (disk pressure; ~6 ckpts/run suffices) Each lora2r ckpt is ~1.3G (A/B + redundant frozen A0/B0, 252 modules fp32). The 768G disk filled and runs crashed at the step-0 ckpt save. 20-step cadence halves the per-run footprint while keeping enough points for the eval curve. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 00:35:29 +00:00
wassname	97aede8d9c	fix(routeV): keep=bulk gate + deterministic teachers + deploy-mode generation Three fixes after the deploy-solve=0 run (all user-confirmed): 1. Gate co-adaptation: t_lo sat at quantile(route_quantile), so keep was only the bottom ~5% and ~95% of rollouts landed in mid (both blocks train, qmass~0.5). Move the keep/mid boundary up: t_lo=quantile(1-2q), t_hi=quantile(1-q), so keep is the BULK, mid+rout are the top 2q. Three-zone absorption preserved (not deleted). 2. Deterministic teachers: every teacher-phase prompt is drawn from the both-pool- covered set and gets EXACTLY teacher_n_per_prompt hack + N solve (constant count, no flip/coverage drops). Replaces mix_ratio*_even_split (count varied per step). No flip in the teacher phase (solve teacher carries solve pressure). mix_ratio>0 stays the on/off switch. Removed dead _even_split. 3. Deploy-mode generation: student rollouts generate under ablate_quarantine, so the behavior policy = the shipped deployed-only model -- the quarantine's learned hack can't saturate the rollout distribution and starve honest solve advantage. For clean-gated rollouts gen and train forward now match. Also: FastConfig lr 1e-4->5e-4 (random-init lora2r needs more lr in the short budget). AGENTS.md: don't bake unconfirmed theories into comments; don't inflate diagnosis confidence across turns. Smoke + smoke-solvemix green; all verify gates pass. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-11 00:29:12 +00:00
wassname	3f2b44452a	feat: online-stats gate + step-level teacher forcing + AUROC diagnostic The authored absolute band made pos>=1 unreachable for live hacks (rout~0), and re-extracting it every 5 steps collapsed the gate (the #40 step-5 cliff). - Online-stats gate: route by live quantiles of the pooled cos-to-v_grad (top route_quantile -> hack, bottom -> keep, middle -> mid), window flushed on refresh. v_grad stays authored-only; only the threshold follows the live distribution. Smoke: routing sustained past the refresh (cliff fixed). - Step-level teacher mix (#31): mix_ratio is a fraction of ALL the step's gens, not a per-prompt round; symmetric hack+solve teachers injected as ordinary gens (not specially routed). Fixes the per-prompt rounding wart. - AUROC + cosU step columns: v_grad as a live hack-detector vs the hack-label (measurement-only, never routes) -- discriminates threshold-vs-direction failure and whether a refresh destroys separation. - Inline eval stays off (eval_ablate_every=0); deploy scored offline. - Fix _sample_rows None crash (beartype) on the no-solve-pool path. - Remove dead pooled_gate_thresholds (the rejected authored-pooled approach). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-10 14:22:37 +00:00
wassname	05a00aa487	feat(T4): symmetric solve-teacher pool + routed-share discrimination diagnostic --solve-pool-dir splits the G_t teacher budget solve_mix_frac solve / rest hack (default off). The gate's routed-share is split by teacher SOURCE: a discriminating gate routes hack teachers (d->1) and KEEPS solve teachers (d->0); equal shares = non-directional (shrinkage null). Teacher source is our pool construction, not a live-rollout oracle label -- a legit diagnostic. Per-step debug + final BLUF (hack-routed vs solve-routed gap, 🟢/🟡/🔴). _sample_rows helper dedups the draw. Smoke: just smoke-solvemix green (split+diagnostic path runs end-to-end). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-10 12:02:39 +00:00
wassname	bf616749ee	Consolidate tagged hack pairsets in data	2026-06-10 11:58:53 +00:00
wassname	944ada360b	cleanup(lora2r): resolve user TODOs -- F.linear alias + jaxtyping hook shapes torch.nn.functional.linear -> F.linear (import F); annotate A/B/A0/B0 with Float[Tensor, ...] dims. Behaviorally identical -- verify_lora2r_routing green (identity 0.00e+00, all three masks + mixed-batch + ablation). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-10 11:51:49 +00:00
wassname	35286040ed	run: decision arms explicit at --unhackable-frac=0.5 (25%->50%) Equal hack/solve pressure, harder problems, faster env (user call 2026-06-10). Pin the frac on the command line so the headline regime is self-documenting, not silently default-dependent. Requeued #36-39 at 0.5 with honest 50% labels. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-10 11:50:12 +00:00
wassname	7e11c024c4	cleanup: delete dead delta_S machinery (PiSSA->lora2r leftovers) Off the live lora2r path; removed with vhack.py (commit `4120d75`): - proj.py: drop project_delta_S_grad/_project_one_module/mean_cos_pre_from_grads/ _hackward_cos (no live importer; train.py uses only per_token_logps). - verify_science_invariants: test pairset_sha256's content gate directly (drops the load_v_hack vehicle + fake delta_S wrapper fixture). - extract_vhack_grad: import pairset_sha256 from .pairs (was re-exported via vhack). - tablelog/figs: stale 'delta_S grads'/'knob' comments -> A/B grads. Smoke + verify_science_invariants green; no delta_S left in live code. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-10 11:45:54 +00:00
wassname	4120d75ea4	feat: top-k routing subspace for routeV (--v-grad-k, gate=max_i cos) k=1 (default) stays the mean-mass mean-diff axis -- headline unchanged. k>1 builds the top-k oriented SVD dirs of the paired diff and the gate scores max_i cos(g, v_i) (alignment to ANY known hack sub-mode), catching multi-modal hack signal one mean washes out. Shared _build_v_grad at init + refresh; band edges and the live gate both max over k. Sims use einsum + jaxtyping dims. Smoke: just smoke-topk green (top-3 subspace, band width +0.087, 12/14 modules). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-10 11:44:45 +00:00
wassname	9fd2b6b89b	test: add mixed-batch per-rollout routing gate to verify_lora2r_routing (T8) 2a-2c only tested UNIFORM masks. 2d puts rollout 0 clean (0,0) and rollout 1 hack (1,1) in ONE forward and asserts the mixed deployed grad == rollout-0-alone-clean and the mixed quarantine grad == rollout-1-alone-hack -- the load-bearing per-rollout mask vectorization ([G,1,1] reshape) with no cross-rollout bleed. Green on tiny-random. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-10 11:24:49 +00:00
wassname	4ee5c27f7b	docs: rewrite README for lora2r/three-arms (was SVD-delta_S/erase) Replace the SVD-of-W / delta_S / erase / cin-cout description with the lora2r adapter (rank-2r LoRA, deployed [:r] + quarantine [r:] blocks, SGTM three-way masks, deploy=ablate quarantine), the two-pass routeV gate, and the three live arms (none/routeV/absorb). Fix the dead quick-start recipes (queue-decision). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-10 11:23:21 +00:00
wassname	103d0acc2c	cleanup: delete antipasto.py; attic 7 erase-era scripts (T1/T6) antipasto.py (PiSSA/lora_frozen_b/old-lora2r wrappers) is dead in the live path -- train.py/extract use lora2r.py, nothing imports antipasto. Move the 7 scripts that import it or the erase-era proj fns (rescore_deploy, eval_checkpoint_curve, verify_vhack_heldout, probe_distill, diag_cosine_dist, diag_pairs_compare, tt_erase_bench) to scripts/attic/ -- they need lora2r rewrites if resurrected. Live imports verified clean. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-10 11:21:53 +00:00
wassname	62ebf719d0	justfile: prune to lora2r-only (645->~180 lines) Drop every recipe invoking deleted CLI (erase/routeV_per_token/--routeV-absorb-all/ --routeV-gate/--v-hack-path/--half-a/--beta/fast-lora*/fast-lora2r/full) and the retired probe_distill/diag/cross-mech/substrate-plot tooling. Keep: smoke arms (none/routeV/absorb + all), queue-decision/baseline/no-loophole, env-construction pools (runtests/substrate/solve), results, paper tooling. Short, ordered, commented. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-10 11:20:48 +00:00
wassname	5714996c56	docs+justfile: pairs concept note (AGENTS.md) + lora2r smoke/decision recipes AGENTS.md: explain what a routing pair IS (same-prompt hack/clean = pos/neg, vector = grad(prompt+hack)-grad(prompt+clean); no problem_id semantics; identical hack/clean under a DIFFERENT prompt = distinct gradient). Caught that prog_wide_clean is NOT a byte-identical subset of pairs_authored: 3/8 shared pairs differ in prompt. justfile: smoke recipes now use the live arms (none/routeV/absorb), drop deleted flags (--intervention=erase, --routeV-absorb-all, --adapter, --v-hack-path). Add smoke-all and queue-decision (the headline 4-arm lora2r run). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-10 11:08:59 +00:00
wassname	5c97975185	refactor: collapse to lora2r-only (none/routeV/absorb); delete erase/antipasto/lora_frozen_b paths train.py rewritten straight-line for the single rank-2r Gaussian-init LoRA adapter and three arms (intervention none\|routeV\|absorb). Removes the erase grad-surgery, act_vote/online_stats gates, beta/KL reference path, per-source split harvest, the v_hack injection block, and all per-mechanism E/C/D/A-B tallies. Folds in: - T2 Gaussian init (lora2r.py): A0~N(0,1/d_in), B0~N(0,1/2r), net delta 0 at init. - T3 width-pooled gate labels: single (num/den) fraction across modules, skip zero-width modules, raise if none separate (was per-module equal-weight blowup). - T5 absorb arm: masks pinned (1,0) -> both blocks train, no gate. - T6 self-contained ckpt: A/B/A0/B0 in one file (no _hack file, no SVD cache), adapter:"lora2r" in saved cfg. - T8 m3: step_flagged logs the hack share (d.mean), not m.mean. Gates green: verify_lora2r_routing (4 invariants) + smoke none/routeV/absorb end-to-end on tiny-random Qwen3 (logs in /tmp/claude-1000/smoke_*.log). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-10 10:58:22 +00:00

1 2 3 4 5 ...

488 Commits