wassname
cd695c411b
docs: improve quick-scroll README
2026-06-25 13:36:00 +08:00
wassname
afbfbf514f
docs: add interactive refusal tables
2026-06-25 13:23:34 +08:00
wassname
cfcb57b9ce
docs: use one Quarto source for README and Pages
2026-06-25 13:06:12 +08:00
wassname
bcbc1d0f65
docs: render Pages with Quarto workflow
2026-06-25 12:44:39 +08:00
wassname
22dd2c2df9
docs: rank README result tables by t-stat
2026-06-25 12:33:11 +08:00
wassname
caceaebbf0
docs: streamline README and add interactive Pages plot
2026-06-25 12:31:50 +08:00
wassname
d31cac9068
docs: simplify model matrix visualization
2026-06-25 12:20:35 +08:00
wassname
026b22e131
docs: simplify model matrix ranking
2026-06-25 11:54:06 +08:00
wassname
2f62327acc
docs: render README with Quarto
2026-06-25 11:44:04 +08:00
wassname
026a57e246
docs: make README tables rerenderable
2026-06-25 11:31:49 +08:00
wassname
2f7184f609
eval: summarize refusal probe model matrix
2026-06-25 11:12:12 +08:00
wassname
85b4a6f354
eval: refresh stress template results
2026-06-25 09:58:23 +08:00
wassname
a88acae536
docs: add persona prior-art guide
2026-06-23 10:32:20 +08:00
wassname-claude
6b272b8c86
Make validator honor self-contained scenario prompts (fixes 3p suffix clash)
...
The 52 3p-observer scenarios carry their own question + length
("...what does the actor do next? Answer in one or two sentences."),
but _generation_prompt always appended a 1p "What do you do? Answer in
80-140 words." -- two conflicting questions and a POV flip, so the data
could not reproduce the appendix's 3p measurement through this script
(caught in review).
Add a "self_contained" flag on scenario rows; when set, _generation_prompt
emits the scenario verbatim instead of appending its default 1p question.
Bare-vignette families without the flag are unchanged. README appendix
rewritten to lead with the runnable command and explain the flag.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com >
2026-06-21 04:25:13 +00:00
wassname
f894a35fc3
fix: preserve template provenance in hf main
2026-06-13 20:54:21 +08:00
wassname
d91eda0228
eval: test engineered prefixes as templates
2026-06-13 20:43:44 +08:00
wassname
671c6258ce
docs: include engineered baseline in scoreboard
2026-06-13 20:05:19 +08:00
wassname
15d7caa607
eval: judge identical controls uniformly
2026-06-13 20:00:49 +08:00
wassname
45c0f24022
eval: clean axes and audit persona leakage
2026-06-13 19:46:24 +08:00
wassname
562c8fd0f0
docs: keep generated stats out of data
2026-06-13 19:12:24 +08:00
wassname
8dbc02066b
eval: rerun dual judges and refresh results
2026-06-13 19:12:24 +08:00
wassname
e2546fe0ab
eval: refine judge rubric and README baselines
2026-06-13 19:12:24 +08:00
wassname
ede354f07a
eval: add dual judges and controls
2026-06-13 19:12:24 +08:00
wassname
d1ee948760
tidy
2026-06-13 19:12:24 +08:00
wassname
f55ba7576f
misc
2026-06-13 17:36:16 +08:00
wassname
849b1de0b1
clarify persona template scoring
2026-06-13 15:28:53 +08:00
wassname
5b92bdf7a7
expand confound audit docs
2026-06-13 14:43:03 +08:00
wassname
ae3fc096d7
add source urls and confound audits
2026-06-13 14:39:45 +08:00
wassname
de071e79ca
use normalized score components
2026-06-13 14:34:02 +08:00
wassname
bce30daee9
make main dataset table human-facing
2026-06-13 14:28:10 +08:00
wassname
1461e930e5
simplify public readme
2026-06-13 14:23:47 +08:00
wassname
6a19b65e49
add clean score tables
2026-06-13 14:05:26 +08:00
wassname
9b1a6e7573
simplify public docs and parquet upload
2026-06-13 13:55:43 +08:00
wassname
4e27617821
add v2 candidate persona library
2026-06-13 10:09:32 +08:00
wassname
97ceaf5908
release persona steering template library
2026-06-13 10:05:35 +08:00