From 637f9388c8a8e08a5844fac166a495a2c0c02920 Mon Sep 17 00:00:00 2001 From: wassname <1103714+wassname@users.noreply.github.com> Date: Sun, 7 Jun 2026 11:40:40 +0000 Subject: [PATCH] docs: cite SGTM paper in AGENTS.md (absorption/leakage vocab source) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com> --- AGENTS.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index 4504e6c..5ae8248 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -74,6 +74,9 @@ On persona pairs On concepts such as "what are contrastive pairs" or "why SVD space" grep - ./docs/vendor/AntiPaSTO_concepts/README.md -For the original paper +For the original paper (the substrate: reward-hacking LeetCode env) - LessWrong post: ./docs/papers/2025_lw_ariahw_steering-rl-training-benchmarking-interventions.md -- Code: ./docs/vendor/rl-rewardhacking \ No newline at end of file +- Code: ./docs/vendor/rl-rewardhacking + +For the gradient-routing prior (SGTM; source of the absorption/leakage vocab) +- ./docs/papers/grad_routing/paper_sgtm.md \ No newline at end of file