2025-04-22 08:14:30 +08:00
2025-04-22 08:14:30 +08:00
2025-04-22 08:14:30 +08:00
wip
2025-04-22 08:52:43 +08:00

Unbiased Assessment of LLM Moral Foundations: Controlling for Positional Effects and Response Steering

Difference from previous work

  • control for positional bias
  • use mechinterp representation steering

Links:

S
Description
No description provided
Readme 6.7 MiB
Languages
Jupyter Notebook 99.3%
Python 0.7%