mirror of
https://github.com/wassname/llm-moral-foundations2.git
synced 2026-06-26 16:00:27 +08:00
main
Unbiased Assessment of LLM Moral Foundations: Controlling for Positional Effects and Response Steering
Difference from previous work
- control for positional bias
- use mechinterp representation steering
Links:
TODO
- add amoral reasoning steering https://huggingface.co/soob3123/amoral-qwen3-14B
Description
Languages
Jupyter Notebook
99.3%
Python
0.7%
