mirror of
https://github.com/wassname/llm-moral-foundations2.git
synced 2026-06-27 16:10:07 +08:00
363 B
363 B
Unbiased Assessment of LLM Moral Foundations: Controlling for Positional Effects and Response Steering
Difference from previous work
- control for positional bias
- use mechinterp representation steering
Links:
TODO
- add amoral reasoning steering https://huggingface.co/soob3123/amoral-qwen3-14B