evil_MoE/scripts at 0112f4a36d5322b01f07592363bb31c178001ec1 - evil_MoE - Gitea: Git with a cup of tea

wassname/evil_MoE

mirror of https://github.com/wassname/evil_MoE.git synced 2026-06-27 16:15:35 +08:00

Files

T

History

wassname 0112f4a36d fix: rotate the unhackable (gt_only) subset per step, not frozen per pid

The gt_only flip was seeded on (seed, pid) at load time, so the SAME ~10% of
problems were unhackable every step -- a fixed honest subset the model can
memorize instead of learning to genuinely solve the distribution. Move the flip
into the train step loop, seeded on (seed, step, pid), so the unhackable subset
rotates: over training every problem is sometimes shown hint-free.

Both halves flip, not just the hint:
- prompt: messages_gt (plain "pass all tests", no loophole disclosed) vs messages
- grader: eff_mode=gt_only routes compute_reward to the honest oracle (no channel),
  so a hack earns format-only reward. Teacher demos are skipped on flipped steps
  (a cached loophole hack no longer matches the hint-free prompt).

Gates: verify_rewards gains gt_only rows (every hack -> passed=False, reward 0.5);
new verify_rotation proves messages_gt is hint-free AND the subset rotates per step.
Smoke logs flip count (1/30 hint-free, graded gt_only).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-10 06:14:08 +00:00

..

build_combined_pool.py

reorg: out/ sorted by datatype (vhack/ pools/ runs/ vhack_grads/ figs/)

2026-05-30 03:52:24 +00:00

build_runtests_pool.py

fix: dense run_tests teacher pool (6 -> 215 prompts) so the hack seeds in 60 steps

2026-06-07 11:01:31 +00:00

build_substrate.py

cleanup: consolidate stale loaders and pair scripts

2026-06-09 12:47:32 +00:00

diag_cosine_dist.py

cleanup: consolidate pairs modules into build scripts + add solve_train to table

2026-06-09 09:17:42 +00:00

diag_pairs_compare.py

cleanup: consolidate pairs modules into build scripts + add solve_train to table

2026-06-09 09:17:42 +00:00

eval_checkpoint_curve.py

rename: deployed/as_trained policy views, kill 'knob' (schema paired_final_v2)

2026-06-10 05:26:51 +00:00

make_random_vhack.py

cleanup: trim 2 stale provenance/train-of-thought comments

2026-06-03 00:25:22 +00:00

migrate_deploy_v1_to_v2.py

tool: migrate v1 deploy_test/eval_curve -> v2 field names (for mid-flight runs)

2026-06-10 05:27:38 +00:00

pairs_from_rollouts.py

rename python package projected_grpo -> vgrout

2026-06-05 14:51:48 +08:00

pairset_build_authored.py

fix: rename 4 canonical LeetCode function names in authored/clean pairsets

2026-06-09 09:23:33 +00:00

pairset_build_intent.py

cleanup: consolidate pairs modules into build scripts + add solve_train to table

2026-06-09 09:17:42 +00:00

pairset_build_progsets.py

refactor: named pairset JSONs + explicit --vhack-pairs-path, remove None fallback

2026-06-09 08:09:09 +00:00

plot_deploy_overlay.py

rename python package projected_grpo -> vgrout

2026-06-05 14:51:48 +08:00

plot_dynamics.py

refactor: extract train_config.py + run_artifacts.py from train.py; slim results scripts

2026-06-09 13:34:50 +00:00

plot_emergence.py

rename python package projected_grpo -> vgrout

2026-06-05 14:51:48 +08:00

plot_floor_ceiling.py

rename: deployed/as_trained policy views, kill 'knob' (schema paired_final_v2)

2026-06-10 05:26:51 +00:00

plot_substrate.py

rename python package projected_grpo -> vgrout

2026-06-05 14:51:48 +08:00

probe_distill.py

refactor: extract train_config.py + run_artifacts.py from train.py; slim results scripts

2026-06-09 13:34:50 +00:00

probe_plot_stack.py

refactor: move 5 leaf entrypoints src/ -> scripts/ (src is now library-only)

2026-06-03 00:23:56 +00:00

rescore_deploy.py

rename: deployed/as_trained policy views, kill 'knob' (schema paired_final_v2)

2026-06-10 05:26:51 +00:00

results_deploy.py

rename: deployed/as_trained policy views, kill 'knob' (schema paired_final_v2)

2026-06-10 05:26:51 +00:00

results.py

refactor: extract train_config.py + run_artifacts.py from train.py; slim results scripts

2026-06-09 13:34:50 +00:00

tt_erase_bench.py

eval: final deploy eval records knob-on (deployed-as-trained) for quarantine arms

2026-06-09 13:09:50 +00:00

validate_spoonfeed.py

cleanup: consolidate stale loaders and pair scripts

2026-06-09 12:47:32 +00:00

verify_base_solve.py

fix: eval on paper test set, not contaminated holdout (base solve 0.94->0.094)

2026-06-07 11:01:31 +00:00

verify_eval_gap.py

refactor: extract train_config.py + run_artifacts.py from train.py; slim results scripts

2026-06-09 13:34:50 +00:00

verify_partition.py

test: no-cheat partition + teacher-pool composition gate (verify_partition.py)

2026-06-05 04:36:03 +00:00

verify_rewards.py

fix: rotate the unhackable (gt_only) subset per step, not frozen per pid

2026-06-10 06:14:08 +00:00

verify_rotation.py

fix: rotate the unhackable (gt_only) subset per step, not frozen per pid

2026-06-10 06:14:08 +00:00

verify_science_invariants.py

eval: final deploy eval records knob-on (deployed-as-trained) for quarantine arms

2026-06-09 13:09:50 +00:00

verify_vhack_heldout.py

eval: final deploy eval records knob-on (deployed-as-trained) for quarantine arms

2026-06-09 13:09:50 +00:00