evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 18:04:59 +08:00

Author	SHA1	Message	Date
wassname	180d3e862c	spec: banded cosine gate (lower/upper from pair clean/hack cosines) + live-A calibration validation Replaces the single-midpoint tau with a two-edge band: x=cos(g_step,vec), route_frac=clamp((x-lower)/(upper-lower),0,1). lower=mean cos(g_cho,vec), upper=mean cos(g_rej,vec). Below lower keep, above upper route, between = absorption (proportional split). Band WIDTH (hkgap=upper-lower) is the real-vs-random discriminator (random vec closes the band) so no separate matched-fraction control is needed; collapse flags vec degeneracy. Calibration validation on live KNOWN-mode-A rollouts (detector-labelled, no-cheat): mean route_frac(A_hack) >> mean route_frac(A_clean) confirms the pair-set band transfers to the sampled live distribution. Also picks g_step granularity (per-rollout default vs per-step). Held-out B never in validation. Corrects the earlier wrong claim that component-routing collapses to erase (pseudocode 03 route v1 forward uses dS+dS_hack -> divergent trajectory). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-06 02:16:38 +00:00
wassname	53d88bc9ee	spec: fold external-review into pair-routing plan; default teacher_off_step=30 External review (Claude + deepseek-v4-pro) converged on the threshold being circular (c_rej>c_cho holds by construction since vec=mean(g_rej-g_cho)) plus scale-mismatched to live rollouts. Decisions added: leave-one-pair-out as the real vec-generalizes diagnostic; quantile-tau to match flagged fraction in the real-vs-random control; route the vec-component (erase-style) not the whole rollout; degeneracy diagnostic (hkgap collapse); pre-register the science UAT (n>=3 seeds, effect>random-baseline std). teacher_off_step now defaults to 30 on the base Config so every arm runs pure on-policy past step 30 (apples-to-apples deploy numbers; job 87 showed hacking self-sustains after the cut). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-06 01:03:13 +00:00
wassname	dfdc538428	spec: pair-routing impl plan + resume-after-compaction state Adds actionable train.py targets (delete build_route2_anchors, rewrite _route2_grad_filter to pure cos>tau gate, pair-calibrated tau refreshed every N, teacher_off_step=30), current state (queue PAUSED, on main, rollback tag), queued-job disposition (superseded vs keep), and smoke/UAT. Self-contained handoff for post-compact. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-06 00:10:23 +00:00
wassname	68b0624733	backup: pueue job manifest (94 jobs, id/status/label/argv) at routing-refactor Local log backup in out/pueue_logs_backup/20260606T000138/ (status.json + full log + task_logs) is gitignored/box-local; this manifest is the durable why-label copy. Killed confounded full-teacher route2 directionality jobs 118/119/121/122/123. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-06 00:01:58 +00:00
wassname	0fa250b193	handoff: pre-routing-refactor snapshot + diagnosis route2 directionality exposed the vector is not load-bearing: hack_anchor force-routes teacher+detector by label (bypassing v_grad), tau calibrated from a live detector, so random==real because labels carried it. Redesign: teacher-off@30, drop force-route, calibrate tau from the A-pairs (no live detector), maybe use the pairset directly vs a rank-1 vector. Decisive test = A5 real(126) vs random(135). Queue snapshot + design notes in docs/REFACTOR_HANDOFF.md. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 23:58:35 +00:00
wassname	f82a4f034d	paper: interim directionality fig (app:directionality) + confound TODO route2 deploy hack collapses for ANY v_grad (real/placebo/Haar) but solve tracks direction (real>placebo>Haar). TODO names the load-bearing confound: full-teacher runs force-route all teacher rows by label (hack_anchor), so the hack-axis collapse is direction-free force-routing not the cosine gate; clean test = A5 run_tests-only regime (pending). n=1 interim. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 23:40:02 +00:00
wassname	329066e99b	paper: teacher-off control appendix (app:teacher) -- teacher seeds not sustains Vanilla deploy-hack keeps climbing after teacher cut at step 40 (0.36->0.58, job 87), at/above teacher-on (job 97). Closest-match jobs differ in LR; FIXME to swap in lr-matched job 124 (queued low-prio). CSV is the committed data artifact; fig regen by plot_teacher_ablation.py. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 12:30:49 +00:00
wassname	ac418a54ce	journal: #186 teacher-off vanilla hacking self-sustaining (job 87, 0.36->0.58 on-policy) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 12:07:41 +00:00
wassname	6dd6b74e73	afk: lite hourly check (one cron at :23, no deep dive unless broken) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 10:35:58 +00:00
wassname	7eac7750dc	afk: add docs/AFK_CHECK.md (scopes hourly check to directionality mystery) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 09:46:38 +00:00
wassname	d2b0fcb255	afk: scope hourly check to directionality mystery (docs/AFK_CHECK.md); drop routine no-finding journal entry (h) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 09:46:24 +00:00
wassname	6f60ebafa1	journal (h): AFK check -- no-cheat E-by-mode table re-confirmed on job 95; directionality framing corrected Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 09:35:27 +00:00
wassname	a3a3f09824	retract 'null_city contaminated' framing -> in/out-of-subspace + cosine-is-correlational Haar's ~0 cos is concentration of measure (out-of-subspace), not a cleaner placebo. Semantic placebos are in-subspace and share generic structure, so a nonzero cos with hack is the expected floor, not 'they found the hack'. null_city's high-cos modules are plausibly low-rank-module artifacts. Cosine is correlational; the ablation run is the causal test. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 09:21:41 +00:00
wassname	e5295dc07b	feat: route2 Haar-random v_grad directionality control (H2 vs H4) + semantic placebo fleet The null_city placebo is CONTAMINATED: 20% of its modules align with the hack direction (median \|cos\|=0.06 but a 0.99 tail, shared generic features). So the 'route2 is non-directional' verdict rested on a bad control. Add the clean tests: - route2_random_v_seed: replace pair-derived v_grad with seeded per-module Haar-random unit vectors (~0 cos with hack dir everywhere). Refresh no-ops so the draw stays fixed. 'Nothing routed' (\|\|dS_hack\|\|==0) is now a valid logged outcome, not an abort -- it is itself H4-confirming (a zero-alignment direction may never clear tau). - null_vampire / null_bacon / null_blue: semantic placebo fleet (vampire-vs-werewolf etc.), each an arbitrary direction with different accidental hack-alignment. Maps route2's suppression-vs-alignment as a scatter: H4 predicts it tracks \|cos\|, H2 predicts all suppress. Smoke-validated (smoke-route2 --route2-random-v-seed=0 completes). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 08:43:54 +00:00
wassname	ec00bc4383	docs: A5 leak is double-hacks (not detector FP); placebo non-directionality measured via hkgap Two review questions today exposed imprecise framing in load-bearing comments: - A5 held-out <=1.1% hacked_E is the model double-hacking (one run_tests()-shaped completion that also writes the stdout marker, verified job-95 id 132), not a detector false positive. hacked_E is the mode-agnostic run_tests signature. Grading channels are non-overlapping; the model's strategy is not. - Placebo 'non-directional' is now the hkgap measurement: real-v route2 hkgap 0.6-0.8 (separates hack/clean), placebo ~0 (dead), both deploy hack 0.000. Confirms the degenerate-gate read (H2) over clever-random-direction (H1): suppression is quarantine-volume + exploration floor, not v_hack specificity. Direction only shows in solve (real 0.625 > placebo 0.531). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 08:23:49 +00:00
wassname	8249a9691e	fix: ship smoke fixtures so the gate runs on a fresh clone The smoke prereqs (out/pools/substrate, out/pools/teacher_pool, out/vhack/v_hack_smoke) are gitignored pipeline outputs that only exist on the GPU box -- a fresh clone died at verify_partition.py on a FileNotFoundError for partition.json. Building them from scratch needs a real Qwen3-4B GRPO rollout (pregen-teacher), so they can't be cheaply regenerated CPU-side. Force-add them (~2.2MB) the same way the paper figs under out/ are already tracked, so 'just smoke' is the portable correctness gate it's meant to be. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 07:13:33 +00:00
wassname	55937a86fb	rename python package projected_grpo -> vgrout git mv src/projected_grpo -> src/vgrout and find-replace the module name in all imports (.py), `-m projected_grpo.` invocations (justfile), and the [project] name (pyproject; setuptools auto-discovers via where=["src"]). Left RESEARCH_JOURNAL.md untouched: its commands/paths are dated lab notes tied to past commits, so rewriting them would falsify provenance. Repo dir, git remote, and absolute paths unchanged. Verified: `import vgrout` and `python -m vgrout.train --help` load the full graph; verify_rewards.py + verify_gate_anchor.py (both import vgrout) pass. Full `just smoke` is blocked upstream by missing gitignored data artifacts (out/pools/{substrate,teacher_pool}, out/vhack/smoke*), unrelated to the rename.	2026-06-05 14:51:48 +08:00
wassname	03693e4f30	name the method vGROUT (vector gradient routing) - title: drop the "Quarantine ... Representation?" metaphor for "vGROUT: Vector Gradient Routing against Reward Hacking" - Method: add a two-phase definition (make v_hack; then erase=discard the component / route=redirect the gated gradient into a deletable adapter, deleted at deploy). Honest framing: route preserves (not discards); follows Shilov et al.'s post-backward deletable-block routing in the gradient-routing family, gated by an extracted direction not a per-example data label - strip literal "SGTM" from the body (confusing acronym); cite renders as author-year. README + pyproject describe vGROUT (package name unchanged)	2026-06-05 14:51:48 +08:00
wassname	07e1eb8753	paper: fix build, vector figs, +2 plots, de-jargon prose - drop fontawesome5 (tectonic core-dumped on the OTF); the lone \faGithub icon was decorative - switch the two included figures PNG->PDF (vector; now-tracked, smaller) - add fig:generalisation (A5 dumbbell) next to tab:generalisation and fig:traindeploy (train-on vs deploy-off) in C1, both \ref'd - rename leaked config codenames in appendix tables (v_hack_full -> "weak (10 pairs)", null_city -> "random (placebo)") with paper:code mapping comments - de-jargon reader-facing prose per a 3-model external panel (kimi-k2.5 / gemini-3.1-pro / gpt-5.5): knob -> (auxiliary) adapter, quarantine -> isolate, no-cheat payload -> zero-label test, hack-ward -> hack-aligned, cousin/near-twin -> analogue, etc. Title metaphor left as-is. 14 pages, zero unresolved refs.	2026-06-05 14:51:48 +08:00
wassname	04562c5226	doc: fix stale tab:ablation provenance — random-V is job 106 not 87 Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 05:59:28 +00:00
wassname	08ed96292f	fig: point keynote includegraphics at tracked out/figs PNG (drop gitignored symlink) docs/ is gitignored, so docs/writeup/figs/*.png symlinks are untracked -- a fresh clone would have no figs/ dir and the build would break. The PNG itself (out/figs/dyn_sub4_hack_overlay.png) IS tracked; point at it directly, matching the sibling fig at L411. Build verified: 11 pages, no unresolved refs. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 05:20:55 +00:00
wassname	3ae1e8376d	journal: close (a) WATCH — placebo endpoint refutes route directionality (job 86) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 05:01:18 +00:00
wassname	273c9ae4aa	Merge branch 'probe/distill-cosine' of https://github.com/wassname/projected_grpo into probe/distill-cosine # Conflicts: # .claude/memory/MEMORY.md	2026-06-05 04:52:47 +00:00
wassname	562832acec	test: no-cheat partition + teacher-pool composition gate (verify_partition.py) The other half of the no-cheat family (sibling of the gate-anchor leak). Asserts on the real out/pools/substrate/partition.json: (1) partition is a clean function into the 4 distinct substrate modes, each populated; (2) under teacher_modes={run_tests} the kept teacher pool is ALL known-mode -- held-out modes get ZERO demos and are genuinely held out (>0 problems). Vibe-check, not a theorem; wired into just smoke. 6/6 pass. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 04:36:03 +00:00
wassname	5242f66b7e	figs: a5 dedup title->axis arrow + CSV, overlay onset dot->labeled vline - a5: drop per-panel title (restated the axis); fold direction into the xlabel (DEPLOY hack rate (down=better) / solve (up=better)). Dump a5_generalisation.csv (per mode,arm deploy hack/solve mean+/-std) -- the reproducibility source it lacked. - overlay (dyn_sub4_hack_overlay etc): replace the per-arm onset DOT with a single dashed labeled 'first hack' vertical line, matching the small-multiples/longrun. - (dyn_sub4_hack_overlay shares dyn_sub4.csv -- same runs, different view, no new CSV.) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 04:13:37 +00:00
wassname	8daf58d25e	figs: a5 vanilla->route arrows, equiv0->approx0, skip degenerate train_deploy, prune orphans - a5_generalisation: connectors -> arrows (baseline->ours direction, shows the drop and the stdout solve-cost honestly). - equiv0 -> approx0 everywhere: these are finite-sample estimates, not identically 0. - plot_train_vs_deploy skips when train==deploy for every run (no knob-ON contrast); fixes the 'can't see train' longrun/sub4 figures (they had no hk_on data). - Prune 9 orphan figure sets not referenced in paper or blog (regenerable on demand); keep the 3 referenced + a5 + train_vs_deploy_60_train_deploy. All 4 CSVs committed. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 04:08:58 +00:00
wassname	f0cbbacaf0	save per-eval deploy-adapter ckpts (rescore w/o retrain) + CLAUDE.md test lesson save_eval_ckpts (default on): write the deploy adapter (δS only, ~2.3MB) at each deploy-eval step, step-tagged, so a run can be re-scored later (more prompts / different eval) without retraining. The A5 run saved only final+first_hack, which is why the leak needed a full retrain rather than a rescore. AGENTS.md: every load-bearing invariant gets a verify_*.py gate. The no-cheat leak shipped because the green gates never covered the property -- 'tests passed' is meaningless if the property was never tested. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 03:58:26 +00:00
wassname	7b08a7ede9	journal: A5 gate leak fixed (teacher-only anchor) + airtight rerun queued (job 111) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 03:54:09 +00:00
wassname	34ad20db0a	fix route2 no-cheat leak: teacher-only gate anchor + unit test The route2 tau-gate anchored on (teacher OR hacked_E student). hacked_E is the run_tests detector; it cross-fires <=1.1% on held-out modes (stdout 17/1540, file_marker 2/1337), force-routing those rollouts -- a real label leak into the held-out class, not noise. Add gate_anchor_teacher_only: anchor on teacher rows only, so held-out classes get PROVABLY zero detector labels (airtight A5 control). Extracted the inline anchor loop to build_route2_anchors() and added scripts/verify_gate_anchor.py (wired into just smoke): proves default reproduces the leak (held-out FP student force-routed) and teacher_only removes it (zero student routing, teachers unchanged). 9/9 assertions pass. Rescoring can't fix this -- the leak is in training (gate shaped the weights), not scoring (per-mode ground-truth eval is clean). Retrain is the only path; the A5 run saved no per-eval checkpoints anyway. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 03:53:23 +00:00
wassname	a9523c9cb8	fix overlay label collisions: common right-gutter anchor + leaders End-labels sat on the line termini (2-arm figs) and piled up bottom-left on ragged-length multi-arm overlays (substrate, where arms end at different steps). Now all labels anchor at one gutter x with a leader fanning back to each line's actual end, y-de-collided. Added right margin so the gutter is clear. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 03:31:26 +00:00
wassname	504922a3d6	fix collision: lift 'deploy hack =0' off the y=0 line in train_vs_deploy The solid-red deploy line ran straight through the annotation text (tufte collision test). Move it into the empty band above the flat line (axes y=0.12). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 03:25:49 +00:00
wassname	b616970e42	fix plot integrity: drop n=28 hack_s fallback in train-vs-deploy series A vanilla seed (s43) lacked the held-out deploy eval, so its train series fell back to the noisy n=28 per-step hack_s while other seeds used the n=64 eval. Averaging mixed estimators fabricated a vanilla train-vs-deploy gap that does not exist (lie-factor). Now: train series reuses the knob-off eval only (nan if absent -> seed drops from the mean), and missing eval columns normalise to nan so absent==all-nan. Regenerated all figures from logs. The canonical train_vs_deploy_60 (has hk_on) is unchanged; sub4/longrun byproducts now show train==deploy honestly (no knob-on data to split). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 03:21:48 +00:00
wassname	b3539e50e7	no-cheat check: held-out hacked_E is <=1.1% FP, not ==0; correct caption Re-counted E-by-mode on job-104 A5 route2 rollouts: held-out hacked_E fires 2/1337 file_marker, 0/1386 sentinel, 17/1540 stdout_marker (<=1.1%, detector cross-mode false positives). The 'hacked_E==0' claim was the smaller harvest; at n>=1337 the FP floor shows. Claim corrected to '>=98.9% labels withheld'. Result robust: route2 non-directional (job 86), so the 1% leak isn't load-bearing. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 03:01:20 +00:00
wassname	6869afef72	tab:ablation: fill placebo row (0.000/0.531), correct caption to non-directional Job 86 placebo (null_city arbitrary direction) reached deploy hack 0.000 over the full 60 steps, falsifying the 'expect ~vanilla' prediction. Route's gate is direction-agnostic: the discarded knob absorbs whatever crosses the per-step energy threshold regardless of v_hack alignment. Directional specificity now rests on the erase arm (subtracts prop cos(g,v)), pending. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 02:57:06 +00:00
wassname	3da296469b	plot_deploy_overlay: Cleveland dot plot replaces grouped bars (tufte) y=mode, dot per arm, thin connector per mode so vanilla->route change reads as a line segment. Faint x-grid only, no box (dots+labels carry categories), labels staggered to avoid collision, xerr=seed std when n>1. Kills the invisible zero-bar problem and shows the per-mode drop directly. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 02:51:13 +00:00
wassname	5f478f4bd0	a5: explain generalisation mechanism in caption + ≡0 marker on zero bars Caption now states v is run_tests-only, teacher is run_tests-only, held-out modes have hacked_E=0 so the gate is blind, they emerge on knob-on but deploy~0, and the placebo caveat (suppression is the direction-agnostic quarantine, not v specificity). Bar plot tags invisible zero-height bars with ≡0. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 02:46:57 +00:00
wassname	aa6bf57a73	plot_deploy_overlay: aggregate seeds per arm, std error bars (n>1) Groups per_mode_deploy.json by arm into a list, plots mean+/-std across seeds. At n=1 (current A5: seed 41 only) no bar appears; TODO in code points at the queued a5 seeds 42/43 (jobs 107-110) that will populate it. Bar labels show n. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 02:45:22 +00:00
wassname	04430e3ef2	plot_dynamics: hack≡0 tags on overlay, labels on top panel, 2-panel train-vs-deploy - overlay: floor hack panel below 0 so a pinned-at-0 line shows; direct-label the TOP (hack) panel not the bottom (read top-to-bottom); tag any arm whose series sits at 0 with $\equiv 0$. - train-vs-deploy: replace the 2x2 with one panel per arm, 4 series each -- colour=metric (red hack/green solve), linestyle=train(dashed)/deploy(solid). The route gap (dashed-red up, solid-red at 0) and vanilla overlap (train==deploy) read in one panel. two-axis legend (colour=metric, style=train/deploy). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 02:43:50 +00:00
wassname	176f1baae6	journal: placebo (job 86) full 60-step verdict -- route2 non-directionality locked (12/12 evals at 0.000) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 02:35:03 +00:00
wassname	5257ff010e	plot_dynamics: train-vs-deploy 2x2 uses matched n=64 eval on both rows The train row fell back to per-step hack_s (noisy n=28 train batch) for arms without a knob-on eval, so vanilla's train/deploy rows looked like different estimators. Fix: vanilla/erase have no quarantine -> train==deploy, so reuse hk_dep (the n=64 knob-off eval) for the train row. route2 still uses hk_on (knob-on eval). Now every panel is the same held-out eval, differing only in the quarantine knob. Regen source: train_vs_deploy_60.csv (route2 nofloor_rf2 + vanilla sweep, seed 41, 60 steps). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 02:33:10 +00:00
wassname	0645ae2dd2	fig:longrun: rebuild from job84 route2 + job97 fixed vanilla (no collapse) Old figure paired route2 (job 84) with job 85 vanilla, whose step-88 'collapse' was a hot-preset artifact. Job 97 re-ran vanilla-200 gentle and stays coherent. New pairing: route2 holds deploy hack at 0; vanilla rises to ~0.32 (onset ~step 40); route2 solve ends higher (0.61 vs 0.47). Caption now flags the remaining optimizer mismatch (route2 hot / vanilla gentle, both beta=0) and TODOs the matched beta=1e-5 regen (jobs 100/101). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 02:18:34 +00:00
wassname	ea0293083c	journal: queue erase directionality test (jobs 105/106 real-v vs placebo) The route2 placebo result (job 86, dir-insensitive gate) doesn't settle directionality; erase projects with magnitude ~cos(g,v) so it's the arm where direction must enter. Matched 60-step s41 pair differs only in v source. #183. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 01:01:08 +00:00
wassname	beee08b559	journal: WATCH placebo (job 86) suppressing deploy hack like real v_hack -- directionality at risk Vanilla nofloor deploy hack 0.36 (metric not degenerate), real-v_hack route2 0.000, placebo route2 0.000 through step 20. If step-60 endpoint confirms, the route2 suppression is substantially non-directional (knob mechanism, not v_hack specificity). Verdict + random-V cross-check next cycle. #183. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 00:36:04 +00:00
wassname	b311815fc6	writeup: C4 'teacher bootstrap is an accelerant, not the signal' (addresses the no-endogenous-run caveat) New Results subsection + fix stale Limitations bullet (50%->12.5%). Three evidence pieces: (1) construction (4 teacher vs 28 student rollouts/step, student out-hacks ~3:1 from step 40, job 103 trajectory); (2) A5 held-out emergence (teacher demos only run_tests, student emerges 3 modes with zero teacher examples); (3) teacher-off@40 control TODO (jobs 93/94, bumped ahead of A3 random-V). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-04 23:26:54 +00:00
wassname	b891109633	A5 FINAL: held-out modes suppressed 0.62->0.02 (file_marker), zero held-out labels (job 104) Fill route2 column of tab:generalisation from job 104 per_mode_deploy.json; regen A5 figure (add routing2 arm key to plot_deploy_overlay). All three held-out modes drop near zero at knob-off deploy while emerging on the knob-on path -- routing, not non-emergence. #185. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-04 23:21:39 +00:00
wassname	2fb56e1a91	writeup: prefill A5 generalisation table vanilla column (job 103); reframe as designed held-out test Replaces the old opportunistic n=3 partial read. Vanilla baseline per-mode deploy_hack from job 103 (run_tests-only teacher): run_tests 1.000, file_marker 0.625, sentinel 0.417, stdout_marker 0.167. route2 column pending job 104. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-04 22:30:35 +00:00
wassname	2b48eab6b8	journal: A5 suppression preliminary (job 104 step ~32) -- held-out hacks emerge on-policy, knob-off deploy holds 0.000 Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-04 14:59:26 +00:00
wassname	1265445939	journal: A5 baseline (job 103) finished -- per-mode deploy split, held-out modes all emerge on-policy Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-04 13:11:46 +00:00
wassname	4fcce164f7	memory: merge qmd-lexical + s2-keyed-access into tracked .claude/memory Harness path ~/.claude/projects/.../memory was a real dir that had diverged from the repo copy (the `9c188f6` symlink targeted /root, not this box's home). Merged the two harness-only memories in and re-pointed the harness path at the repo via symlink, so future auto-writes land in-project. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-04 15:40:48 +08:00
wassname	154e33683e	paper: HARVE byline cross-verified arXiv==S2 (keyed semantic-search .env) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-04 15:20:04 +08:00

1 2 3 4 5 ...

316 Commits