persona-steering-template-library

wassname/persona-steering-template-library

mirror of https://github.com/wassname/persona-steering-template-library.git synced 2026-06-27 16:46:08 +08:00

Author	SHA1	Message	Date
wassname-claude	852c441762	Correct 1p speculation with tested result: first-person prompts make it worse Tested the "abstract axes need first-person prompts" hypothesis from the prior commit. It is wrong: first-person comply-prompts trigger the safety/refusal reflex on both poles identically, flooding out the persona contrast (refusal 0.83-1.00, honest_when_uncomfortable 8.0->1.3, action_over_talk 3.0->0.3). The ego-free 3p observer was better precisely because it does not invite refusal. Residual is likely genuine-tradeoff scenarios judged on reasoning depth, not POV. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-21 04:10:15 +00:00
wassname-claude	d2441ad3a8	Add w2schar-mini character axes + 3p-observer prompts + axis-generability finding 27 character persona axes synthesized from the Forethought AI-character essay (Appendix 2) and a character-inspirations doc, plus 52 third-person-observer scenario prompts (tiny-mfv / Clifford-2015 vignettes) they were measured on. README appendix documents an axis x prompt-POV interaction that sharpens the existing "the subtle axis still mostly fails" note: concrete action/disposition axes separate (avoid_power 8.0, honest_when_uncomfortable 8.0, action_over_talk 3.0), while abstract how/signaling axes flatline through every {persona} template (principle_not_signaling 0.0, weigh_who_is_affected 0.0, perspective_taking 0.75). Mechanism is pole-generability compounded by POV: a 3p-observer prompt offers no act-vs-signal fork and the model won't role-play performative non-action. Reframing the neg pole concretely rescues it (principle_not_signaling 0.0 -> action_over_talk 3.0, same idea, behaviour vs abstraction). Separate in-house measurement (qwen3.5-27b gen, gemini-flash-lite judge, n=4), flagged as NOT comparable to the seed-24 pilot table. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-21 04:04:20 +00:00
wassname (Michael J Clark)	d15183742c	Update README.md	2026-06-17 05:10:35 +08:00
wassname	f894a35fc3	fix: preserve template provenance in hf main	2026-06-13 20:54:21 +08:00
wassname (Michael J Clark)	f4905cf8f4	Update README.md	2026-06-13 20:49:40 +08:00
wassname	d91eda0228	eval: test engineered prefixes as templates	2026-06-13 20:43:44 +08:00
wassname	671c6258ce	docs: include engineered baseline in scoreboard	2026-06-13 20:05:19 +08:00
wassname	15d7caa607	eval: judge identical controls uniformly	2026-06-13 20:00:49 +08:00
wassname	45c0f24022	eval: clean axes and audit persona leakage	2026-06-13 19:46:24 +08:00
wassname	562c8fd0f0	docs: keep generated stats out of data	2026-06-13 19:12:24 +08:00
wassname	8dbc02066b	eval: rerun dual judges and refresh results	2026-06-13 19:12:24 +08:00
wassname	e2546fe0ab	eval: refine judge rubric and README baselines	2026-06-13 19:12:24 +08:00
wassname	ede354f07a	eval: add dual judges and controls	2026-06-13 19:12:24 +08:00
wassname	d1ee948760	tidy	2026-06-13 19:12:24 +08:00
wassname (Michael J Clark)	0056ba8cd2	Update README.md	2026-06-13 19:05:06 +08:00
wassname	4675e9782f	tidy and image	2026-06-13 17:45:50 +08:00
wassname	f55ba7576f	misc	2026-06-13 17:36:16 +08:00
wassname	849b1de0b1	clarify persona template scoring	2026-06-13 15:28:53 +08:00
wassname (Michael J Clark)	51b67ac99c	Update README.md	2026-06-13 15:04:26 +08:00
wassname	5b92bdf7a7	expand confound audit docs	2026-06-13 14:43:03 +08:00
wassname	ae3fc096d7	add source urls and confound audits	2026-06-13 14:39:45 +08:00
wassname	de071e79ca	use normalized score components	2026-06-13 14:34:02 +08:00
wassname	bce30daee9	make main dataset table human-facing	2026-06-13 14:28:10 +08:00
wassname	1461e930e5	simplify public readme	2026-06-13 14:23:47 +08:00
wassname	6a19b65e49	add clean score tables	2026-06-13 14:05:26 +08:00
wassname	9b1a6e7573	simplify public docs and parquet upload	2026-06-13 13:55:43 +08:00
wassname	2c86dee10f	add measured v2 pilot stats	2026-06-13 10:13:14 +08:00
wassname	4e27617821	add v2 candidate persona library	2026-06-13 10:09:32 +08:00
wassname	327985c456	record public release links	2026-06-13 10:06:29 +08:00
wassname	97ceaf5908	release persona steering template library	2026-06-13 10:05:35 +08:00

30 Commits