llm-moral-foundations2/README.md at 7d1a6824c24315a517ea6eb6a7f10a8fc0e88d8e - llm-moral-foundations2 - Gitea: Git with a cup of tea

wassname/llm-moral-foundations2

mirror of https://github.com/wassname/llm-moral-foundations2.git synced 2026-06-27 16:10:07 +08:00

Files

T

wassname 7d1a6824c2 judgin cot trace

2025-08-21 21:17:43 +08:00

363 B

Raw Blame History

Unbiased Assessment of LLM Moral Foundations: Controlling for Positional Effects and Response Steering

Difference from previous work

control for positional bias
use mechinterp representation steering

Links:

https://github.com/peterkirgis/llm-moral-foundations

TODO

add amoral reasoning steering https://huggingface.co/soob3123/amoral-qwen3-14B