Files
evil_MoE/scripts
wassname 00e110c498 plot(dyn): make cos row cross-arm-comparable (sep + leak)
The cos row mixed non-comparable quantities: erase logged cin_t/cin_s
(pre-intervention single cosines), route2 logged hkgap (a difference) and
resid (post-intervention), all under one "cos(grad,v_hack)" ylabel. Wassname
flagged it -- they are not the same measurement.

Derive two quantities that mean the same thing in every column (_add_cos_derived):
  sep  = does v_hack still discriminate hacky from clean gradient
         erase: cin_t-cin_s (teacher vs student); route2: hkgap (hackflag vs clean)
  leak = residual hack-alignment of the post-intervention deployed gradient
         erase: cout (after projection); route2: resid (after routing) -- same quantity

Legend now lands on the leftmost arm that has cos data (vanilla has none).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-02 00:20:37 +00:00
..
2026-05-30 04:16:56 +00:00
wip
2026-05-30 04:33:33 +00:00