Tested the "abstract axes need first-person prompts" hypothesis from the prior
commit. It is wrong: first-person comply-prompts trigger the safety/refusal
reflex on both poles identically, flooding out the persona contrast (refusal
0.83-1.00, honest_when_uncomfortable 8.0->1.3, action_over_talk 3.0->0.3). The
ego-free 3p observer was better precisely because it does not invite refusal.
Residual is likely genuine-tradeoff scenarios judged on reasoning depth, not POV.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
27 character persona axes synthesized from the Forethought AI-character essay
(Appendix 2) and a character-inspirations doc, plus 52 third-person-observer
scenario prompts (tiny-mfv / Clifford-2015 vignettes) they were measured on.
README appendix documents an axis x prompt-POV interaction that sharpens the
existing "the subtle axis still mostly fails" note: concrete action/disposition
axes separate (avoid_power 8.0, honest_when_uncomfortable 8.0, action_over_talk
3.0), while abstract how/signaling axes flatline through every {persona} template
(principle_not_signaling 0.0, weigh_who_is_affected 0.0, perspective_taking
0.75). Mechanism is pole-generability compounded by POV: a 3p-observer prompt
offers no act-vs-signal fork and the model won't role-play performative
non-action. Reframing the neg pole concretely rescues it (principle_not_signaling
0.0 -> action_over_talk 3.0, same idea, behaviour vs abstraction).
Separate in-house measurement (qwen3.5-27b gen, gemini-flash-lite judge, n=4),
flagged as NOT comparable to the seed-24 pilot table.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>