Files
evil_MoE/docs
wassname 3cc804b15e results: A3 post-hoc erase rows (weight 0.297 partial, act 0/0 lobotomy)
Job 98 (tt_erase_bench on 20260531 vanilla ckpt, n=192): post-hoc erasure cannot
isolate the hack. weight_erase dents hack 0.391->0.297 (solve flat); act_erase
(Arditi residual ablation @layer35) zeroes hack ONLY by zeroing solve too -- a
lobotomy. Contrast: train-time route gets hack 0.000 AND solve 0.625. Split the
single post-hoc row into weight/act rows in tab:ablation, recorded own-baseline
0.391 in provenance. Journal 2026-06-03(c).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-03 10:50:08 +00:00
..
2026-05-23 14:19:41 +08:00
2026-06-02 02:06:43 +00:00
wip
2026-05-30 04:33:33 +00:00
2026-05-23 11:26:39 +08:00
2026-05-29 06:29:20 +00:00
2026-05-23 11:26:39 +08:00
2026-05-23 10:22:54 +08:00