docs: cite SGTM paper in AGENTS.md (absorption/leakage vocab source)

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-07 11:40:40 +00:00
parent c449273357
commit 637f9388c8
+5 -2
View File
@@ -74,6 +74,9 @@ On persona pairs
On concepts such as "what are contrastive pairs" or "why SVD space" grep
- ./docs/vendor/AntiPaSTO_concepts/README.md
For the original paper
For the original paper (the substrate: reward-hacking LeetCode env)
- LessWrong post: ./docs/papers/2025_lw_ariahw_steering-rl-training-benchmarking-interventions.md
- Code: ./docs/vendor/rl-rewardhacking
- Code: ./docs/vendor/rl-rewardhacking
For the gradient-routing prior (SGTM; source of the absorption/leakage vocab)
- ./docs/papers/grad_routing/paper_sgtm.md