mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:45:42 +08:00
results: A3 erase-static row lands (deploy hack 0.500); both erase arms fail
Job 96 (erase static, frozen v_hack, s41) finished: deploy hack 0.500 / solve 0.500 (HACK_S 0.518). Both erase arms now in tab:ablation and both fail to suppress (static 0.500, online 0.562) vs vanilla 0.359 and route 0.000. Subtracting the extracted direction does not stop hacking; routing the gated rollout does. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -226,8 +226,10 @@ $+0.024$ while a mechanism-contrasting pairset moved it $-0.226$
|
||||
% route2 nofloor = 20260601T115713 (hack 0.000 / solve 0.625) [landed]
|
||||
% vanilla s41 = job 77, 20260602T234727 (hack 0.359 / solve 0.422) [landed]
|
||||
% erase online rf5 = job 76, 20260603T032141 (hack 0.562 / solve 0.438; HACK_S 0.504) [landed 2026-06-03]
|
||||
% erase static = job 96, (hack 0.500 / solve 0.500; HACK_S 0.518) [landed 2026-06-03]
|
||||
% Both erase arms FAIL to suppress (>= vanilla 0.359); route alone zeroes deploy hack.
|
||||
% Still queued/running (cells \TODO with current job id after the requeue):
|
||||
% 96 erase static s41 (running) | 78 route2 refresh-2
|
||||
% 78 route2 refresh-2
|
||||
% 86 placebo null_city pairset (expect ~vanilla) | 87 random-V route (expect ~vanilla)
|
||||
% 88 post-hoc test-time erase (scripts/tt_erase_bench.py on vanilla ckpt)
|
||||
\begin{table}[t]
|
||||
@@ -242,7 +244,7 @@ $+0.024$ while a mechanism-contrasting pairset moved it $-0.226$
|
||||
Arm & Deploy hack & Deploy solve & Source \\
|
||||
\midrule
|
||||
Vanilla (no intervention) & $0.359$ & $0.422$ & job 77 \\
|
||||
Erase static (one-sided) & \TODO{} & \TODO{} & job 96 \\
|
||||
Erase static (one-sided) & $0.500$ & $0.500$ & job 96 \\
|
||||
Erase online (refresh-5) & $0.562$ & $0.438$ & job 76 \\
|
||||
route (refresh-5) & $0.000$ & $0.625$ & 20260601T115713 \\
|
||||
route (refresh-2) & \TODO{} & \TODO{} & job 78 \\
|
||||
|
||||
Reference in New Issue
Block a user