persona-steering-template-library

wassname/persona-steering-template-library

mirror of https://github.com/wassname/persona-steering-template-library.git synced 2026-06-27 18:05:33 +08:00

Author	SHA1	Message	Date
wassname	afbfbf514f	docs: add interactive refusal tables	2026-06-25 13:23:34 +08:00
wassname	cfcb57b9ce	docs: use one Quarto source for README and Pages	2026-06-25 13:06:12 +08:00
wassname	bcbc1d0f65	docs: render Pages with Quarto workflow	2026-06-25 12:44:39 +08:00
wassname	22dd2c2df9	docs: rank README result tables by t-stat	2026-06-25 12:33:11 +08:00
wassname	caceaebbf0	docs: streamline README and add interactive Pages plot	2026-06-25 12:31:50 +08:00
wassname	d31cac9068	docs: simplify model matrix visualization	2026-06-25 12:20:35 +08:00
wassname	026b22e131	docs: simplify model matrix ranking	2026-06-25 11:54:06 +08:00
wassname	2f62327acc	docs: render README with Quarto	2026-06-25 11:44:04 +08:00
wassname	026a57e246	docs: make README tables rerenderable	2026-06-25 11:31:49 +08:00
wassname	2f7184f609	eval: summarize refusal probe model matrix	2026-06-25 11:12:12 +08:00
wassname	85b4a6f354	eval: refresh stress template results	2026-06-25 09:58:23 +08:00
wassname	a88acae536	docs: add persona prior-art guide	2026-06-23 10:32:20 +08:00
wassname-claude	6b272b8c86	Make validator honor self-contained scenario prompts (fixes 3p suffix clash) The 52 3p-observer scenarios carry their own question + length ("...what does the actor do next? Answer in one or two sentences."), but _generation_prompt always appended a 1p "What do you do? Answer in 80-140 words." -- two conflicting questions and a POV flip, so the data could not reproduce the appendix's 3p measurement through this script (caught in review). Add a "self_contained" flag on scenario rows; when set, _generation_prompt emits the scenario verbatim instead of appending its default 1p question. Bare-vignette families without the flag are unchanged. README appendix rewritten to lead with the runnable command and explain the flag. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-21 04:25:13 +00:00
wassname	f894a35fc3	fix: preserve template provenance in hf main	2026-06-13 20:54:21 +08:00
wassname	d91eda0228	eval: test engineered prefixes as templates	2026-06-13 20:43:44 +08:00
wassname	671c6258ce	docs: include engineered baseline in scoreboard	2026-06-13 20:05:19 +08:00
wassname	15d7caa607	eval: judge identical controls uniformly	2026-06-13 20:00:49 +08:00
wassname	45c0f24022	eval: clean axes and audit persona leakage	2026-06-13 19:46:24 +08:00
wassname	562c8fd0f0	docs: keep generated stats out of data	2026-06-13 19:12:24 +08:00
wassname	8dbc02066b	eval: rerun dual judges and refresh results	2026-06-13 19:12:24 +08:00
wassname	e2546fe0ab	eval: refine judge rubric and README baselines	2026-06-13 19:12:24 +08:00
wassname	ede354f07a	eval: add dual judges and controls	2026-06-13 19:12:24 +08:00
wassname	d1ee948760	tidy	2026-06-13 19:12:24 +08:00
wassname	f55ba7576f	misc	2026-06-13 17:36:16 +08:00
wassname	849b1de0b1	clarify persona template scoring	2026-06-13 15:28:53 +08:00
wassname	5b92bdf7a7	expand confound audit docs	2026-06-13 14:43:03 +08:00
wassname	ae3fc096d7	add source urls and confound audits	2026-06-13 14:39:45 +08:00
wassname	de071e79ca	use normalized score components	2026-06-13 14:34:02 +08:00
wassname	bce30daee9	make main dataset table human-facing	2026-06-13 14:28:10 +08:00
wassname	1461e930e5	simplify public readme	2026-06-13 14:23:47 +08:00
wassname	6a19b65e49	add clean score tables	2026-06-13 14:05:26 +08:00
wassname	9b1a6e7573	simplify public docs and parquet upload	2026-06-13 13:55:43 +08:00
wassname	4e27617821	add v2 candidate persona library	2026-06-13 10:09:32 +08:00
wassname	97ceaf5908	release persona steering template library	2026-06-13 10:05:35 +08:00

34 Commits