paper: HARVE byline cross-verified arXiv==S2 (keyed semantic-search .env)

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 16:30:30 +08:00 · 2026-06-04 15:20:04 +08:00
parent b097d9abfc
commit 154e33683e
1 changed files with 3 additions and 2 deletions
@@ -240,8 +240,9 @@
 % Closest published analogue to our contrastive-SVD extraction, but on the static
 % reward MODEL not the live policy gradient: build a multi-directional hacking
 % subspace from residual-stream diffs of contrastive gold-vs-hacked pairs, project
-% the reward-head vector off it. Authors scraped from arXiv (S2/DBLP not yet
-% indexed as of 2026-06-04); abstract from arXiv. "\model" -> HARVE inlined.
+% the reward-head vector off it. 7-author byline cross-verified arXiv == Semantic
+% Scholar (keyed S2 API via the semantic-search skill .env); abstract from arXiv.
+% "\model" -> HARVE inlined.
@misc{liu2026harve,
  title        = {HARVE: Hacking-Aware Reward-Head Vector Editing for Robust Reward Models},
  author       = {Liu, Shuang and Bo, Yuxuan and Zhao, Qiuyang and Huang, Caiyue and Chen, Xiaorong and Liu, Yanguang and Du, Mengnan},