Formatting pass lifted from the AntiPaSTO paper (the format the author is
happy with):
- verbatim -> lstlisting (framed, shaded, Python-highlighted code blocks;
chat-template prompt uses language={} so markup isn't keyword-coloured)
- xcolor[table] + \rowcolor highlight on the 'ours' rows (keynote, ablation)
- ablation table restructured as leave-one-out with the negate symbol
(negate-routing/directional/hack-pairs/intervention); long interpretation
moved out of the caption into section body; post-hoc split into its own block
- real AntiPaSTO citation (Clark 2026, arXiv:2601.07473) replacing the
UNVERIFIED placeholder; dropped the verify-before-submission TODO
- code-availability line with a GitHub glyph (anonymous placeholder)
Builds clean: 11 pages, no unresolved refs/cites.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- Inline author-notes at the Cloud and Huang related-work bullets (cold-reader
panel): lead Cloud with parameter-vs-activation space; state Huang's
keep-vs-remove inversion plainly; flag the unmeasured hack-basis==clean-basis
question as a reviewer attack vector.
- Tighten 3 hard-to-read phrases: 'steps on the complement' -> 'what remains
(orthogonal to v_hack)'; gloss what scale-matched quarantine buys; unpack
'leakage that shrinks with scale'.
- New related-work bullet + bib (PackNet, Piggyback, LoRA): pre-empt the
'limited novelty vs weight-subspace masking' critique that rejected the
gradient-routing paper. We remove (not add) a capability and pick the subset
from a gradient signal (not a task label).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>