Commit Graph

7 Commits

Author SHA1 Message Date
wassname 07e1eb8753 paper: fix build, vector figs, +2 plots, de-jargon prose
- drop fontawesome5 (tectonic core-dumped on the OTF); the lone \faGithub
  icon was decorative
- switch the two included figures PNG->PDF (vector; now-tracked, smaller)
- add fig:generalisation (A5 dumbbell) next to tab:generalisation and
  fig:traindeploy (train-on vs deploy-off) in C1, both \ref'd
- rename leaked config codenames in appendix tables (v_hack_full ->
  "weak (10 pairs)", null_city -> "random (placebo)") with paper:code
  mapping comments
- de-jargon reader-facing prose per a 3-model external panel
  (kimi-k2.5 / gemini-3.1-pro / gpt-5.5): knob -> (auxiliary) adapter,
  quarantine -> isolate, no-cheat payload -> zero-label test, hack-ward ->
  hack-aligned, cousin/near-twin -> analogue, etc. Title metaphor left as-is.

14 pages, zero unresolved refs.
2026-06-05 14:51:48 +08:00
wassname 154e33683e paper: HARVE byline cross-verified arXiv==S2 (keyed semantic-search .env)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-04 15:20:04 +08:00
wassname b097d9abfc paper: add verified related work (11 refs) + fix Huang->Deng first author
Related-work search (local qmd/gh/LW + Perplexity/Gemini/ChatGPT/Elicit), all
arXiv ids verified HTTP 200, bibtex+abstracts via the bibtex MCP / arXiv scrape:
- gradient-level reward hacking: ackermann2026gradreg (GR), liu2026harve (HARVE)
- deletable-module precedent (pre-dates Cloud): zhou2023securityvectors
- gradient-projection unlearning: shamsian2025orthograd (OrthoGrad), sun2026ogpsa
- C2 generalisation: taylor2025schoolrewardhacks, nishimuragasparian2025rhgeneralize
- weight-space contrastive direction: fierro2025weightarithmetic
- shortcut gradient surgery: cao2026sart; survey: wang2026rewardhackingsurvey
- idea provenance: mallen2025rhinterventions (AF)
Fix: huang2026directional first author is Deng, Wenlong (arXiv 2605.25189);
sync the cold-reader comment to 'Deng et al.'

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-04 15:18:44 +08:00
wassname bd7550f559 paper: framed code blocks, real AntiPaSTO cite, leave-one-out ablation
Formatting pass lifted from the AntiPaSTO paper (the format the author is
happy with):
- verbatim -> lstlisting (framed, shaded, Python-highlighted code blocks;
  chat-template prompt uses language={} so markup isn't keyword-coloured)
- xcolor[table] + \rowcolor highlight on the 'ours' rows (keynote, ablation)
- ablation table restructured as leave-one-out with the negate symbol
  (negate-routing/directional/hack-pairs/intervention); long interpretation
  moved out of the caption into section body; post-hoc split into its own block
- real AntiPaSTO citation (Clark 2026, arXiv:2601.07473) replacing the
  UNVERIFIED placeholder; dropped the verify-before-submission TODO
- code-availability line with a GitHub glyph (anonymous placeholder)

Builds clean: 11 pages, no unresolved refs/cites.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-03 11:22:22 +00:00
wassname 17a8792340 paper: address comprehension friction + OpenReview novelty challenge
- Inline author-notes at the Cloud and Huang related-work bullets (cold-reader
  panel): lead Cloud with parameter-vs-activation space; state Huang's
  keep-vs-remove inversion plainly; flag the unmeasured hack-basis==clean-basis
  question as a reviewer attack vector.
- Tighten 3 hard-to-read phrases: 'steps on the complement' -> 'what remains
  (orthogonal to v_hack)'; gloss what scale-matched quarantine buys; unpack
  'leakage that shrinks with scale'.
- New related-work bullet + bib (PackNet, Piggyback, LoRA): pre-empt the
  'limited novelty vs weight-subspace masking' critique that rejected the
  gradient-routing paper. We remove (not add) a capability and pick the subset
  from a gradient signal (not a task label).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-03 02:29:45 +00:00
wassname cf3ecc40f8 write up 2026-06-02 07:20:42 +00:00
wassname 923de6dbe6 docs(writeup): NeurIPS-workshop paper skeleton + tectonic compile recipe
Minimal LaTeX skeleton: outline + evidence tables (route2 n=3 deploy numbers
filled with provenance, vanilla pending jobs 74/84) + figures + verified refs
+ appendix (4-mode traces, 6/6/6/6 partition counts, pseudocode). Build
artifacts and figs symlinks gitignored. `just paper` compiles via tectonic;
`just paper-qc` dumps text + greps for unresolved refs / TODOs.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-02 06:59:15 +00:00