evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-29 21:25:04 +08:00

Author	SHA1	Message	Date
wassname	ea4f4ee657	feat: rollout_ablate_frac exploration floor vs hack-saturation (route/route2) Generate a fraction of student rollouts with delta_S_hack ablated (deployed model -> can't hack -> explores solves), so the solve region stays covered even if on-policy sampling collapses onto hacking. Motivated by job 60's hkgap decay to ~0 post-emergence (gate stops discriminating; risk that hack eats everything and delta_S starves). Pure sampling-side diversity, no no-cheat-boundary impact; frac=0 = unchanged. Smoked at frac=0.5. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 05:32:04 +00:00
wassname	dfc6068896	feat: route2 resid column = cos(deployed delta_S.grad after routing, v_grad) Disambiguates qE: high qE + resid~0 = hack stripped cleanly into the deleted quarantine; high qE + resid>0 = false-negative leak into the deployed knob. hkgap measures cloud separation, qE measures energy, neither measures hack-ward alignment of the KEPT gradient. (DeepSeek review #5 starvation-vs-correct-routing diagnostic.) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 03:20:48 +00:00
wassname	59f8b6efdc	journal: route2 capacity-imbalance realization + scale-matched delta_S fix Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 02:58:35 +00:00
wassname	dc5d4516c2	smoke: run on GPU (bf16 + flash_attn2), not CPU+fp32 The CPU smoke ran fp32 + sdpa, so it never walked the bf16/flash_attn2 path the real run uses -- a whole dtype/magnitude bug class was invisible to the gate (per the smoke principle: a path that doesn't fire in smoke isn't covered). The tiny- random model peaks ~1.4GB on GPU, so cost is negligible. Drop CUDA_VISIBLE_DEVICES= from every smoke recipe; train.py auto-detects cuda -> bf16. (Stale fp32 smoke v_hack must be re-extracted bf16; auto-extracts on cache-miss.) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 02:56:34 +00:00
wassname	8158adb543	refactor: route2 quarantine = scale-matched delta_S_hack, rip out 33M LoRA The distinct-basis A_q/B_q LoRA (~33M params at rank-16) gave the quarantine a ~100x capacity edge over delta_S, so routing-everything-there was the low- resistance path: qE pinned ~0.97 (energy into the thrown-away knob) while the deployed delta_S learned nothing (job 54). The cause was capacity imbalance, not the routing gate (calibrated-tau already separated hack/clean, hkgap>0). Consolidate to one adapter type: the quarantine is now delta_S_hack, the second diagonal in the same frozen SVD basis, shape [r], capacity-matched to delta_S, zeroed at deploy. route2's calibrated-tau gate parks the flagged rollouts' grad into delta_S_hack.grad (like proj.py's route parks its subspace projection); delta_S keeps the unflagged. Both diagonals train at one shared lr. Removed: A_q/B_q params, v_act + extract_v_act, the act-mask arm (a shared diagonal can't be per-token gated), route2_mask / route2_quarantine_rank / route2_quar_lr_scale knobs, the separate quar optimizer group. Arm name routing2_{act,grad} -> routing2. v_grad refresh extracts from delta_S (main) with the quarantine ablated. SGTM check: their gradient routing uses a hard detach on capacity-matched reserved dims, no soft/tanh/sigmoid gate -- balance is the fix, not gating. Smoked clean: tau/hkgap/qE render, \|\|delta_S_hack\|\|>0 assert passes, exit 0. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 02:52:02 +00:00
wassname	6eb894f44d	feat: per-step calibrated tau for route2-grad routing (replaces cos>0 coin-flip) tau = EMA midpoint of hack-cloud (teacher + detector-flagged student) vs clean-cloud (not-flagged student) cos(g_b,v_grad), per module. Rides the cin drift; force-routes known hacks, tau-routes the ambiguous rest (incl unknown B). New cols tau + hkgap (hack-clean separation gauge). Keeps the vector premise -- the flag only calibrates, never gates. Spec: docs/spec/20260601_calibrated_tau_route2grad.md Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 02:15:10 +00:00
wassname	acc23885b6	spec: per-step calibrated tau for route2-grad (keep vector, fix coin-flip gate) Routing stays vector-based (cos>tau, not the detector flag) but tau is the per-step EMA midpoint of the hack vs clean cos clouds (teacher+flagged-student anchor hack; not-flagged anchor clean). Rides the cin drift; force-routes known hacks; tau-routes unknown B. Logs tau + hkgap. No-cheat: detector only calibrates, gt_pass never gates. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 02:08:26 +00:00
wassname	1d105a93a4	review: 3-model external panel on route2 pseudocode + synthesis DeepSeek/GPT-5.5/Gemini converge: (1) UNANIMOUS top concern -- prove the v_hack DIRECTION is causal, not the detector flag/capacity (random-V + flag-only triad); (2) route2-grad over-routes too (cos>0 = ~50% coin-flip by concentration, not a granularity fix); (3) improvement B != erase only via on-policy generation, which ablate-during-gen would remove. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 01:44:31 +00:00
wassname	090f29671d	docs: SGTM vs ours -- diagnostics, tricks, and proposed improvements (B = route within delta_S along SVD axes) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 01:39:46 +00:00
wassname	23512ed07c	feat: qE column -- grad energy fraction into the quarantine \|\|g_quar\|\| / (\|\|g_keep\|\| + \|\|g_quar\|\|) for routing arms. Makes job-46's invisible failure legible: act-mask coin-flip dumps learning into the deleted quarantine, so the deployed delta_S learns nothing while lp_t stays flat. ~0 = quar idle; ~0.5+ and climbing = quarantine eating the update. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 01:38:15 +00:00
wassname	7a55b77786	audit-log: print a fixed healthy-vanilla gen as a coherence yardstick The audited last-gen alone has no reference. A frozen coherent vanilla snippet (maxPoints step 59) above it makes salad obvious -- e.g. job 46 step 14 is clearly soup next to it, even though lp_t stayed flat and the tripwire missed it. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 01:15:25 +00:00
wassname	bb778d3cc6	refactor: drop verbose route2 refresh log line, fold overlap into compact refr column The per-refresh logger.info was noise (one line per refresh @every-2-5 steps). The refr column already marks the refresh; for act-mask it now carries the basis-overlap (mean \|cos\| old-vs-new v_act). grad-mask keeps a bare marker. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 01:08:55 +00:00
wassname	8ef78f6d14	route2 refresh basis-overlap log + soft ppl-drop warning - route2 v_act/v_grad refresh now logs basis_overlap_with_prev (mean \|cos\| of old vs new mask direction) -- matches the clean-repo guard; a bare refresh bool carried no info, overlap shows if the mask chases a drifting target. - divergence tripwire gets a soft logger.warning at 3-nat lp_t drop before the 5-nat hard abort (early 'coherence slipping, lr too high?' heads-up). - threshold note: healthy lp_t runs -0.5..-2.5, collapse ~-11, so an absolute <-1 warning would false-fire; relative-drop-from-best is the right test. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 00:39:43 +00:00
wassname	11bcdd2fe6	route2 instrumentation + lr fix + deploy overlay (route2-act divergence) route2-act diverged (run 43): 33M kaiming A_q/B_q at delta_S's lr=3e-3 blew up (gn 0.3->7.5 step 8, generations -> token salad, lp_t -11). Fixes: - #167 separate quarantine lr (route2_quar_lr_scale=0.1) so the 60x-bigger fresh LoRA isn't trained at the main-knob lr. - #168 divergence tripwire on teacher ppl (lp_t high-water mark; abort if it drops >5 nats for 2 steps). Relative so tiny-random smoke (flat lp_t~-11.9) doesn't false-trip. - #165 act-path was silent: stash cos(a,v_act) + fired-fraction in the forward, surface as act_cos/act_fire columns (route2-act). smoke shows act_fire=0.64 => the cos>0 sign test over-routes (fires on most tokens, not just hack ones). - #166 print last train generation before FINAL EVAL (coherence eyeball). - route2 v_act/v_grad refresh was firing but silent -- now announced. - #162 plot_deploy_overlay.py: per-mode DEPLOY overlay from per_mode_deploy.json (honest shipped-model numbers, route2-safe). just plot-deploy. - just plot/results hardened: parse by header name, skip non-substrate logs, non-fatal aggregate delegation. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 23:16:39 +00:00
wassname	ad048e59c6	fix: results.py parses gt_s/hack_s by header name, not stale fixed indices Old GT_S=6/HACK_S=8 were the pre-sprd/N layout; current table is gt_s=4 hack_s=6, so newer logs were silently mis-read and old distill logs crashed _frac on a non-fraction token. Now locate the train.py streaming header (first token 'step' + 'ref_eq' present) and map columns by name. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 22:45:12 +00:00
wassname	47abce4307	journal: erase arm sub4 (-7.8pp hack/+4.7pp solve, win on file_marker) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 18:56:09 +00:00
wassname	b0432af1e8	journal: vanilla sub4 emergence reference (per-mode asymmetry) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 16:56:16 +00:00
wassname	b72c5acee2	journal: routing deploy-solve > train-solve (the "never learns to pass" worry, refuted) Three route runs all show deleting the quarantine raises solve and lowers hack. Mechanism: clean-rollout solve gradient stays unflagged -> flows to delta_S; the hack masks that competence at train time, revealed at deploy. Exception: run_tests (solve 0->0) where hacking fully dominated exploration. Logs the 3 failure-mode checks (eval artifact / teacher-distillation / random-V null). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 14:11:53 +00:00
wassname	6b22dc5055	feat: per-mode deploy JSON artifact for every arm + queue-substrate recipe #164: the final eval now runs for ALL arms (not just route/route2) on the same fixed eval subset, so the all-arms overlay reads identical per-mode numbers. vanilla/erase have no quarantine -> deploy == train (one eval); route/route2 also run the knob-off (ablated) eval. Writes a single per_mode_deploy.json into run_dir (arm, mask, refresh, seed + per-mode train/deploy hack+solve) as the canonical source for the #162 overlay plot. justfile: replace the parametrized run-substrate (which re-passed seed/steps/ refresh/mask defaults every invocation) with one explicit queue-substrate that queues the fixed 5-arm overlay set, each arm passing ONLY its non-default flags. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 14:10:20 +00:00
wassname	dd3b5af3db	spec: log execution pass (refresh no-op + bf16 dtype fixes, random-V cancelled, defaults cleanup, T4 split) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 13:39:31 +00:00
wassname	1086c98de7	cleanup: substrate pool + prog_wide pairs are FastConfig defaults The verbose argv (--teacher-pool-dir, --vhack-pairs-path, and redundant --vhack-refresh-every/--seed/--steps) came from run-substrate passing everything explicitly. steps/seed/refresh were already defaults; the two paths weren't. Now FastConfig defaults to the current experiment line so a real run needs only --intervention (+ optional seed/refresh/mask). Smoke (SmokeConfig) unaffected -- it sets its own pool. Stripped the recipe to match. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 13:39:07 +00:00
wassname	80f6b52860	fix: route2 quar/v_act dtype mismatch on bf16 model (A_q/B_q/v_act fp32 vs bf16 x) Smoke is fp32 (CPU tiny-random) so the bf16 path never fired -- job 34/35 crashed on the real Qwen3-4B with 'BFloat16 != float' in the quar matmul. Cast A_q/B_q/v_act down to activation dtype in the forward, mirroring the delta_S.to(a.dtype) pattern (fp32 master, bf16 compute, grads cast back). Validated forward+backward in bf16 for both masks. + run-substrate MASK param. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 13:35:25 +00:00
wassname	25569193c5	fix: route2 refresh was a silent no-op (vhack_refresh_every gated on v_hack, which route2 sets None) route2 uses v_act/v_grad, not v_hack, so --vhack-refresh-every never fired for it -- the mask was frozen regardless of the flag. Frozen real-V route (job 32) shows why this matters: cin_t decays to cin_s by step 7, deploy hack only drops ~8pp (vs run-31 rf5 ~0). Now re-extracts v_act/v_grad every N steps with the quarantine ablated (same MASK_PAIRS, no oracle). + journal entry (j). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 12:19:38 +00:00
wassname	20f8630848	spec: T4 leakage-metric design (SGTM ratio form) + defer L1 knob with reasoning Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 11:28:47 +00:00
wassname	2b020c95c0	fix: route2 Arm A flags per-rollout not per-token (external review) The hook gate is necessarily per-token ([G*s, r], nn.Linear flattens the batch). _route2_grad_filter now sums each rollout's token gate-grads before the cos(g_b, v_grad) flag, so routing is per-rollout (the preregistered GRPO unit) and the sign is denoised. Per-token a clean rollout scatters ~50% of tokens over cos>0 by noise, spuriously routing half its gradient mass. Verified by deepseek-v4-pro review: gate identity, divide-out, eps-guard, Arm B detach-route, R5 no-cheat all correct; this was the one finding. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 11:25:13 +00:00
wassname	ffeb632652	journal: 3 external reviews converge on v1-route artifact hypothesis; route2 inherits the Adam-parasite risk, prioritize leakage metric Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 10:51:11 +00:00
wassname	670fcb3c64	feat: route2 grad-mask (Arm A) + drop tau knob + pairset-derived v_hack path Arm A (route2_mask=grad): per-rollout gate splice (identity at c=1) recovers the per-sample delta_S grad after backward (c.grad = delta_S * g_b); train.py divides it out (eps-guard \|delta_S\|>1e-6), flags rollouts by cos(g_b, v_grad)>0, and SUBTRACTS them from delta_S.grad. Single-pass, no forward detach, no second backward -- the cross-step mismatch that made the spec's A1 stale-mask awkward never arises (routing is post-backward within the step). v_grad = unit-mean gradient diff from extract_v_hack raw grads (gradient-space analogue of v_act). route2 forces the combined (non-split) backward since cos_pre is NaN for it anyway, which also gives the gate a single clean grad to read. Drop route2_tau: never tuned; the mask is cos>0 (the natural hack-ward boundary) and the load-time noise floor already filters axes. v_hack path now auto-derives from --vhack-pairs-path (out/vhack/v_hack_pairset_ <stem>.safetensors): pass the pairset, the hack file auto-loads/extracts -- no need to also pass --v-hack-path. run-substrate drops the redundant flag. smoke: smoke-route2 (act) and new smoke-route2-grad both pass (\|\|B_q\|\|=0.109, exit 0); erase shared-basis path unchanged (cout->0, fired~0.9). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 10:48:31 +00:00
wassname	442630fcae	docs: routing-v2 spec, related-work scorecard, paper fetches, journal Routing-v2 spec (distinct-basis quarantine, two arms, proofs); related-work no-cheat scorecard for TDGA/Cloud/SGTM/Confessions; full-text fetches of the Deng and SGTM papers; journal entry for the run-31 confound + T1/T2 landing. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 10:16:21 +00:00
wassname	4359dc53a8	feat: route2 distinct-basis quarantine + per-sample act-mask detach-route Adds intervention=route2: a LoRA quarantine (A_q,B_q) with its own basis, always summed into the forward, plus a per-sample activation-cosine mask that detaches the kept adapter for flagged samples. Routing happens in the forward, not via grad surgery: a flagged sample updates only the quarantine; an unflagged hack-like sample concentrates there by gradient magnitude (absorption). Deploy zeroes A_q,B_q. v_act built by extract_v_act (forward-only activation mean-diff over persona pairs). Fixes the per-prompt zero_grad wiping quarantine grads before opt.step. scripts/make_random_vhack.py = the random-V route control. vhack_refresh_every default 0->5 (0 is ablation-only). Smoke: R1 grad check passes (flagged->delta_S grad 0, A_q/B_q>0; forward value unchanged); smoke-route2 \|\|B_q\|\|=0.109, deploy eval + asserts pass. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 10:16:13 +00:00
wassname	6cce11326a	journal: 4-arm 60-step matched -- erase ≈ vanilla, route effect is deploy-only and confounded Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 06:41:59 +00:00
wassname	c6748023ba	diag: cos_pre/post = \|\|relu(V@g)\|\|/\|\|g\|\| (hack-ward fraction) not signed sum The signed sum(c)/\|\|g\|\| let +/- v_hack axes cancel, reading ~0 even while a large hack-ward magnitude was being routed -- a misleading gauge that drove the 'route does nothing' misread. relu(c) BEFORE the norm matches what the one_sided projection actually removes (\|\|removed\|\|=\|\|relu(c)\|\| for orthonormal V), so cin reads as 'fraction of grad stripped' in [0,1] and cout -> 0 exactly after erase. Renamed _signed_cos -> _hackward_cos; flagged the now-invalid E[cos\|clean]=0 decomposition in probe_plot_stack. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 05:03:54 +00:00
wassname	07acadb43f	plot: single 'just plot' entrypoint emits per-mode + aggregate (reuse plot_dynamics) - plot_substrate.main now also calls plot_dynamics.plot/plot_hack_overlay so one command produces all 4 figs (by_method, by_hack, aggregate, hack_overlay); the aggregate 'total hacks per arm' core plot is kept, not reimplemented. - plot_dynamics: point parser at CURRENT streaming headers (cin_t/cin_s, hk_dep/ slv_dep); it was built for the old cos_pre_t/hack_deploy spelling and silently failed on sub4 logs. No backward-compat for the superseded header. - justfile: 'plot GLOB STEM' canonical entrypoint over logs/_sub4_.log. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 04:37:31 +00:00
wassname	83235b6cfe	feat: per-mode deploy hack in route eval (the weak-detector generalisation cut) eval_hack_solve now returns by_mode tallies; the final ROUTE EVAL logs deploy hack/solve per env_mode, tagged IN-dist (run_tests, what v_hack saw) vs held-out (file_marker/sentinel/stdout_marker). Distinguishes 'route suppressed only its own mode' from 'route generalised to unseen modes'. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 04:32:10 +00:00
wassname	f1af70d34e	journal: route puzzle resolved (signed-cos red herring, deploy eval works, v_hack run_tests-only) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 04:28:02 +00:00
wassname	638f3e964e	journal: erase arm completed (4/4 modes, HACK_S=0.274 at PASS=0.284) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 01:54:40 +00:00
wassname	031a93b701	journal: erase arm cin_t/cin_s crossover (teacher->student hack-grad handoff) n=1 live obs from pueue 29: cin_t +0.27->~0, cin_s ~0->+0.15, crossover ~step 10-14. Mechanism inference (advantage-variance collapse on the all-hacking teacher group + student becoming the hack-grad source) held at 0.6 with the 3 competing failure modes (erase-does-it / refresh-artifact / noise-floor), each with a falsifier against the queued vanilla+route arms. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 00:46:07 +00:00
wassname	d781b56ff4	docs: fix review findings (global noise-floor, route one-sided, G3 xref) External review (3 subagents) caught: - blog: noise-floor drop is GLOBAL across modules, not per-Linear (proj.py:187) - blog: route pseudocode used full c; route actually uses the same one-sided gate as erase and quarantines the identical 'removed' vector (proj.py:124,199) - spec: 'never seen by detector' -> clarify student trains on all 4 modes, the detector just never labels C/D for v_hack extraction; cross-ref G3/task #107 Dismissed: reviewer claim that only exit_code survived (stale spec; live log columns hk_rt/hk_so/hk_se/hk_fm confirm 4 modes) and a hallucinated 'Furthermore'. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 00:41:12 +00:00
wassname	f7288e569d	docs: 4-arm framing, weak-detector test, hack-mode appendix - blog: appendix with prompt+hint/hack/clean traces for all 4 loophole modes (run_tests/sentinel/stdout_marker/file_marker) - blog: 'four things we compare' (vanilla/erase/route/route-weak), faithful extract pseudocode (per-completion zero_grad), erase+route step pseudocode, refresh rationale + route quarantine-ablate subtlety - blog+README: cite Gradient Routing (Cloud et al. 2024, 2410.04332) as the route arm's lineage - README: 'what we compare' section + appendix pointer - spec: weak-detector arm as the operationalized generalization test Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 00:20:47 +00:00
wassname	8c2421d644	journal(e): route-only v_hack refresh collapse root-caused to live quarantine + fix Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 00:04:58 +00:00
wassname	cec6542ed3	fix: ablate quarantine during v_hack refresh so route doesn't lose the hack Root cause of the route-only cin_t collapse at the refresh step: refresh re-extracts the main-knob (delta_S) gradient through the LIVE adapter, but route has by then routed the hack capability into delta_S_hack. With the quarantine active, the main-knob pair gradient no longer carries the hack direction, so the refreshed v_hack rotates off-hack and cin_t drops ~0.32->0.04 in one step. Proof it's routing-specific: the erase arm (same pairs, same refresh, no quarantine) holds cin_t ~0.36 straight through the refresh (basis_overlap 0.83). Fix: wrap the refresh extraction in ablate_quarantine (delta_S_hack=0), matching the delta_S_hack=0 state the build extraction saw. No-op for erase. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 00:04:10 +00:00
wassname	b88afbaf87	docs: review outputs + figs; drop stale Qwen3.5-0.8B svd cache Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 00:00:40 +00:00
wassname	2c266ebdb0	tooling: add ELICIT_HACK prompt tier + validate_spoonfeed updates ELICIT sits between discover and spoonfeed: asks the model to exploit the named grading mechanism without handing it literal code (the elicitability bar). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 00:00:40 +00:00
wassname	23589cb960	diag: log refreshed-basis overlap with prior basis per v_hack refresh cin_t collapses from ~0.3 to ~0.04 exactly at the first refresh step on the 4B substrate route run. Re-extraction happens THROUGH the current adapter (delta_S != 0), unlike the build-time extraction at delta_S=0, so the basis can rotate. This logs \|\|V_new @ V_old^T\|\|_F^2 / k_old (fraction of the old subspace kept) so we can tell 'basis rotated away' (overlap~0) from a teacher- grad/cin measurement issue (overlap~1 but cin_t still drops). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-30 23:46:33 +00:00
wassname	b89e0e62f7	fix: drop stale RH_HINT_REPLACE_TO import that crashed the v_hack refresh path regrade_pool imported RH_HINT_REPLACE_TO from train, but that scalar became the per-mode dict HINT_REPLACE_TO. The crash only fired at the first refresh (lazy import of extract_v_hack), so projected/route runs with --vhack-refresh-every died ~step 5 unattended. The replacement loop it fed was cosmetic (irrelevant for C/D mechanisms, ill-defined per-mode), so removed it. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-30 23:32:28 +00:00
wassname	d99c63b6ce	recipe: prog_wide v_hack + refresh-5 as run-substrate defaults prog_wide pairset cut hack the most (-0.226, no pass cost) in the pairset comparison (results.md), so it's the default v_hack source for the erase/route arms; vanilla ignores it. REFRESH defaults to 5. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-30 23:09:36 +00:00
wassname	a485d4391b	recipe: run-substrate default 60 steps (was 80); matches fast preset	2026-05-30 23:05:20 +00:00
wassname	e45767effb	plot: multi-seed overlay for substrate emergence (thin per-seed + bold mean) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-30 21:23:57 +00:00
wassname	0ea751c5bc	plot: #148 substrate emergence — per-mode hack-rate curves (4/5 learned, seed 41) New scripts/plot_substrate.py parses the hk_<mode> cumulative columns from a multi-loophole substrate run (one log, K interleaved modes) and draws one learning curve per mode with first_step onset dots and direct end-labels. plot_emergence.py can't do this (it groups logs by a single --env-mode). Figure shows the headline: vanilla GRPO learns file_marker/run_tests/ stdout_marker/sentinel, eq_override flat at 0 (never). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-30 16:49:03 +00:00
wassname	b685a4495b	journal: 2026-05-30 (d) headline — vanilla student learns 4/5 loopholes, eq_override holdout (seed 41) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-30 16:47:18 +00:00
wassname	618d9bd3bf	journal: 2026-05-30 (c) hint-coherence audit + K=5 substrate + emergence running Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-30 13:52:07 +00:00

1 2 3 4

184 Commits