mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:30:30 +08:00
docs: cite SGTM paper in AGENTS.md (absorption/leakage vocab source)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -74,6 +74,9 @@ On persona pairs
|
||||
On concepts such as "what are contrastive pairs" or "why SVD space" grep
|
||||
- ./docs/vendor/AntiPaSTO_concepts/README.md
|
||||
|
||||
For the original paper
|
||||
For the original paper (the substrate: reward-hacking LeetCode env)
|
||||
- LessWrong post: ./docs/papers/2025_lw_ariahw_steering-rl-training-benchmarking-interventions.md
|
||||
- Code: ./docs/vendor/rl-rewardhacking
|
||||
- Code: ./docs/vendor/rl-rewardhacking
|
||||
|
||||
For the gradient-routing prior (SGTM; source of the absorption/leakage vocab)
|
||||
- ./docs/papers/grad_routing/paper_sgtm.md
|
||||
Reference in New Issue
Block a user