mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 17:30:41 +08:00
03693e4f30
- title: drop the "Quarantine ... Representation?" metaphor for "vGROUT: Vector Gradient Routing against Reward Hacking" - Method: add a two-phase definition (make v_hack; then erase=discard the component / route=redirect the gated gradient into a deletable adapter, deleted at deploy). Honest framing: route preserves (not discards); follows Shilov et al.'s post-backward deletable-block routing in the gradient-routing family, gated by an extracted direction not a per-example data label - strip literal "SGTM" from the body (confusing acronym); cite renders as author-year. README + pyproject describe vGROUT (package name unchanged)