evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-30 18:14:49 +08:00

Files

T

wassname 419a8ed8cd feat: bigger-spoon mint recipes (literal code) + modes filter + OpenRouter probe

Spoonfeed mint was 0/96 for eq_override/stdout_marker/sentinel/file_marker: the
prose recipes were half-understood (model emits 'return 0') and the probe-channel
detector needs exec-clean code. Fix: hand the model the proven verify_rewards hack
template verbatim ('emit exactly this'), templated with the problem's graded method
name ({func}) -- load-bearing for eq_override. Validated 5/5 on OpenRouter qwen3-8b.

Also: --modes filter (re-mint only failed modes, keep cached run_tests/exit_code),
skip the wasteful faithful cell when minting, dump first completion even at 0 hacks
(the diagnostic that was missing), tqdm progbar.

scripts/validate_spoonfeed.py: direct OpenRouter probe (pi is 47s/call + TUI noise).
Calibration finding: even the PROVEN run_tests hack fails single-shot hint-discovery
on 8b (returns 'return 0'), so single-shot is NOT a learnability oracle -- it only
validates the mint path. RL learnability must be measured by the emergence run.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-05-30 12:20:05 +00:00

build_combined_pool.py

reorg: out/ sorted by datatype (vhack/ pools/ runs/ vhack_grads/ figs/)

2026-05-30 03:52:24 +00:00

make_dataset_pairsets.py

scripts

2026-05-30 04:16:56 +00:00

make_pairsets.py

wip

2026-05-30 04:33:33 +00:00

migrate_out_dirs.py

reorg: out/ sorted by datatype (vhack/ pools/ runs/ vhack_grads/ figs/)