mirror of
https://github.com/wassname/llm-moral-foundations2.git
synced 2026-06-27 16:10:07 +08:00
d17a18ccedb73f135e40f7355a7a36a131d9ea1d
Unbiased Assessment of LLM Moral Foundations: Controlling for Positional Effects and Response Steering
Difference from previous work
- control for positional bias
- use mechinterp representation steering
Links:
Description
Languages
Jupyter Notebook
99.3%
Python
0.7%