evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 20:52:18 +08:00

Author	SHA1	Message	Date
wassname	5fd980244b	docs: note SGTM is the latest gradient-routing paper (same authors) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 11:56:58 +00:00
wassname	637f9388c8	docs: cite SGTM paper in AGENTS.md (absorption/leakage vocab source) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 11:40:40 +00:00
wassname	52619519dc	docs: drop dead refs (spec.md link, verify_gate_anchor.py paragraph) - spec.md never existed at root or docs/; removed the link from AGENTS.md + README.md (the live plan is in docs/spec/ dated files). - RESEARCH_JOURNAL.md link pointed at docs/; it lives at repo root. Fixed. - Trimmed the no-cheat-leak paragraph citing scripts/verify_gate_anchor.py (that file doesn't exist); kept the general 'gate every load-bearing invariant in the same commit' rule. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 11:01:31 +00:00
wassname	83cae4ef72	docs: reframe no-cheat in VECTOR terms; move it README->AGENTS.md The 'weak detector for hack A, generalize to B' framing was wrong for this repo. That is the weak-LABEL setup (labelA -> labelNotA), which is NOT ours. Ours is vec -> routing: vec extracted from hand-built synthetic pairs, route the live GRPO gradient by cosine alignment to vec; no detector ever runs over student rollouts at train time. Generalization = does vec (from pairs covering some modes) suppress held-out modes -- vector generalization, not detector-label. - AGENTS.md: rewrote the no-cheat bullet to the 3-way distinction (oracle grader = cheat; weak-label setup = not ours; vec->routing = ours). For coding agents. - README: removed the 'We cannot cheat' section (belongs in agent instructions, not the new-reader overview). - spec: dropped the stray 'validation uses known-A detector' line; pointed the no-cheat reference at AGENTS.md. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-06 02:39:48 +00:00
wassname	f0cbbacaf0	save per-eval deploy-adapter ckpts (rescore w/o retrain) + CLAUDE.md test lesson save_eval_ckpts (default on): write the deploy adapter (δS only, ~2.3MB) at each deploy-eval step, step-tagged, so a run can be re-scored later (more prompts / different eval) without retraining. The A5 run saved only final+first_hack, which is why the leak needed a full retrain rather than a rescore. AGENTS.md: every load-bearing invariant gets a verify_*.py gate. The no-cheat leak shipped because the green gates never covered the property -- 'tests passed' is meaningless if the property was never tested. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-05 03:58:26 +00:00
wassname	efdf86a0cb	wip	2026-05-30 04:33:33 +00:00
wassname	c1f8ca4e7b	tidy	2026-05-29 06:29:43 +00:00
wassname	f27c658ca9	docs	2026-05-29 05:42:28 +00:00
wassname	646edfc7af	purge dead modules and stale recipes Deletes 7 source files that were superseded but never removed: run.py, grad_proj.py, extract_vhack.py (older twin-NLL extractor), grpo_smoke.py, grpo_proj_smoke.py (smoke harnesses replaced by train.py "smoke" subcommand), phase2_analyze.py (pilot is past), probe_uat.py (UAT pipeline is past). Drops matching justfile recipes (vhack-check, phase2-analyze, probe-uat) and the BASE constant that pointed at run.py. Updates AGENTS/README references to the stale fast-dev-run recipe (now just smoke / smoke-vanilla). Verified by running just smoke-vanilla --steps=2 end-to-end. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-28 08:42:15 +00:00
wassname	120400c5f5	setup	2026-05-23 10:40:02 +08:00

10 Commits