2025-09-06 16:17:59 +08:00
wip
2025-08-23 11:49:31 +08:00
2025-09-06 16:17:59 +08:00
wip
2025-08-23 11:49:31 +08:00
2025-04-22 08:14:30 +08:00
2025-08-21 21:17:43 +08:00
2025-09-06 13:48:48 +08:00
wip
2025-08-23 11:49:31 +08:00
2025-09-06 13:48:48 +08:00

Unbiased Assessment of LLM Moral Foundations: Controlling for Positional Effects and Response Steering

Difference from previous work

  • control for positional bias
  • use mechinterp representation steering

alt text

Links:

TODO

S
Description
No description provided
Readme 6.7 MiB
Languages
Jupyter Notebook 99.3%
Python 0.7%