mirror of
https://github.com/wassname/llm-moral-foundations2.git
synced 2026-06-27 16:10:07 +08:00
4f1e3a11099702d01c13b2b11ffad9d776f706ec
Unbiased Assessment of LLM Moral Foundations: Controlling for Positional Effects and Response Steering
Difference from previous work
- control for positional bias
- use mechinterp representation steering
Links:
TODO
- add amoral reasoning steering https://huggingface.co/soob3123/amoral-qwen3-14B
Description
Languages
Jupyter Notebook
99.3%
Python
0.7%
