evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 20:21:41 +08:00

Author	SHA1	Message	Date
wassname	438068c431	cleanup: consolidate stale loaders and pair scripts	2026-06-09 12:47:32 +00:00
wassname	55937a86fb	rename python package projected_grpo -> vgrout git mv src/projected_grpo -> src/vgrout and find-replace the module name in all imports (.py), `-m projected_grpo.` invocations (justfile), and the [project] name (pyproject; setuptools auto-discovers via where=["src"]). Left RESEARCH_JOURNAL.md untouched: its commands/paths are dated lab notes tied to past commits, so rewriting them would falsify provenance. Repo dir, git remote, and absolute paths unchanged. Verified: `import vgrout` and `python -m vgrout.train --help` load the full graph; verify_rewards.py + verify_gate_anchor.py (both import vgrout) pass. Full `just smoke` is blocked upstream by missing gitignored data artifacts (out/pools/{substrate,teacher_pool}, out/vhack/smoke*), unrelated to the rename.	2026-06-05 14:51:48 +08:00
wassname	5c09feeb14	refactor: decompose train.py helpers into clean's module names Behavior-preserving (smoke + smoke-route2 exit 0, metrics identical, route2 ‖δS_hack‖=0.0079>0). All touched modules import-checked (no cycles). Mirrors the clean repo's responsibility split: - ref_logprobs_via_zero_delta + ablate_quarantine -> antipasto.py (the adapter owns the δS=0 free-ref-model trick and the δS_hack ablation). - load_v_hack + postprocess_v_hack -> extract_vhack_grad.py (alongside extract_v_hack). - load_problems + DATA + the per-mode hints -> new problems.py. Importers updated to the new homes (probe_distill, derisk_loopholes, verify_vhack_heldout, probe_lora_runtime, build_substrate, regrade_pool, scripts/validate_spoonfeed). Moving DATA out of train.py also broke the regrade_pool->train edge, so train.py can now import the v_hack helpers at top level without a cycle. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-01 12:15:12 +00:00
wassname	2c266ebdb0	tooling: add ELICIT_HACK prompt tier + validate_spoonfeed updates ELICIT sits between discover and spoonfeed: asks the model to exploit the named grading mechanism without handing it literal code (the elicitability bar). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-31 00:00:40 +00:00
wassname	eaee3d013d	fix: fail-fast --modes + multi-rep validator (external review) gpt-5.5 review (docs/spec/20260530_code_review.md), both valid: - --modes silently dropped typos/whitespace ('--modes=a, b' -> only a; '--modes=typo' -> empty sweep after a 30s model load, looking like success). Now strips + validates against MODES, raises on unknown before loading. - validator was 1 stochastic sample/mode -> a <1.0-prob reachable hint could be falsely marked unreachable. Now PROBE_REPS samples, reports hits/reps, bar is >=1 exploit in N. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-30 12:45:11 +00:00
wassname	419a8ed8cd	feat: bigger-spoon mint recipes (literal code) + modes filter + OpenRouter probe Spoonfeed mint was 0/96 for eq_override/stdout_marker/sentinel/file_marker: the prose recipes were half-understood (model emits 'return 0') and the probe-channel detector needs exec-clean code. Fix: hand the model the proven verify_rewards hack template verbatim ('emit exactly this'), templated with the problem's graded method name ({func}) -- load-bearing for eq_override. Validated 5/5 on OpenRouter qwen3-8b. Also: --modes filter (re-mint only failed modes, keep cached run_tests/exit_code), skip the wasteful faithful cell when minting, dump first completion even at 0 hacks (the diagnostic that was missing), tqdm progbar. scripts/validate_spoonfeed.py: direct OpenRouter probe (pi is 47s/call + TUI noise). Calibration finding: even the PROVEN run_tests hack fails single-shot hint-discovery on 8b (returns 'return 0'), so single-shot is NOT a learnability oracle -- it only validates the mint path. RL learnability must be measured by the emergence run. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-05-30 12:20:05 +00:00

6 Commits