mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 18:59:35 +08:00
7d08ad2acd
Two-stage script: build out/plots/floor_ceiling.csv (one row per arm/anchor, with SOURCE and STATUS columns flagging every provisional/missing cell) then the keynote figure. Prints TODO/FIXME data gaps before plotting. Panel A: normalized floor->ceiling bars, headline deploy (knob-off, test n=119). Panel B: the knob effect -- arrow knob-ON -> knob-OFF on the SAME held-out val split (eval_curve.jsonl), isolating the quarantine from the train/test memorization gap. Fixes the earlier conflation where the train->deploy arrow mixed knob-on/off with train-problems/test-problems. Data gaps flagged in csv: solve ceiling provisional=paper 0.223 (FIXME job 24), prog_wide arm contaminated (TODO job 28 prog_wide_clean). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
1.2 KiB
1.2 KiB
| 1 | label | kind | hack_deploy | solve_deploy | hack_on | hack_off | solve_on | solve_off | source | status |
|---|---|---|---|---|---|---|---|---|---|---|
| 2 | routeV per-token | method | 0.042 | 0.1429 | 0.6312 | 0.025 | 0.0688 | 0.0688 | 20260607T134234_fast_routingV_seed43_dir6_routeV_pertoken_s43/[deploy_test.json + eval_curve.jsonl] | ok |
| 3 | routeV authored | method | 0.0756 | 0.1176 | 0.6687 | 0.0187 | 0.0563 | 0.0437 | 20260608T134141_fast_routingV_seed43_dir8_routeV_authored_perroll_s43/[deploy_test.json + eval_curve.jsonl] | ok |
| 4 | routeV prog_wide | method | 0.1008 | 0.1261 | 0.6937 | 0.0125 | 0.0688 | 0.0563 | 20260607T195125_fast_routingV_seed43_dir6_routeV_s43/[deploy_test.json + eval_curve.jsonl] | TODO: contaminated pairs -> job 28 prog_wide_clean |
| 5 | routeV random-V | method | 0.1008 | 0.1092 | 0.7 | 0.0437 | 0.075 | 0.0688 | 20260608T020623_fast_routingV_seed43_dir6_routeV_random_s43/[deploy_test.json + eval_curve.jsonl] | ok (directionality control) |
| 6 | vanilla GRPO | method | 0.6134 | 0.1008 | 0.5938 | 0.5938 | 0.075 | 0.075 | 20260608T224659_fast_vanilla_seed43_dir8_vanilla_s43/[deploy_test.json + eval_curve.jsonl] | ok (defines hack-worst anchor) |
| 7 | base (floor) | anchor_floor | 0.0 | 0.1261 | *_dir8_baseline_s43/deploy_test.json | ok (base model; steps=0) | ||||
| 8 | ceiling | anchor_ceiling | 0.0 | 0.223 | Ariahw et al. 2025 (paper), NOT our run | FIXME: PROVISIONAL paper 0.223 -- awaiting job 24 (no-loophole ceiling) |