mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 17:30:41 +08:00
00e110c498
The cos row mixed non-comparable quantities: erase logged cin_t/cin_s
(pre-intervention single cosines), route2 logged hkgap (a difference) and
resid (post-intervention), all under one "cos(grad,v_hack)" ylabel. Wassname
flagged it -- they are not the same measurement.
Derive two quantities that mean the same thing in every column (_add_cos_derived):
sep = does v_hack still discriminate hacky from clean gradient
erase: cin_t-cin_s (teacher vs student); route2: hkgap (hackflag vs clean)
leak = residual hack-alignment of the post-intervention deployed gradient
erase: cout (after projection); route2: resid (after routing) -- same quantity
Legend now lands on the leftmost arm that has cos data (vanilla has none).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>