15 Commits

Author SHA1 Message Date
wassname cd695c411b docs: improve quick-scroll README 2026-06-25 13:36:00 +08:00
wassname afbfbf514f docs: add interactive refusal tables 2026-06-25 13:23:34 +08:00
wassname cfcb57b9ce docs: use one Quarto source for README and Pages 2026-06-25 13:06:12 +08:00
wassname 22dd2c2df9 docs: rank README result tables by t-stat 2026-06-25 12:33:11 +08:00
wassname caceaebbf0 docs: streamline README and add interactive Pages plot 2026-06-25 12:31:50 +08:00
wassname 2f62327acc docs: render README with Quarto 2026-06-25 11:44:04 +08:00
wassname 026a57e246 docs: make README tables rerenderable 2026-06-25 11:31:49 +08:00
wassname d91eda0228 eval: test engineered prefixes as templates 2026-06-13 20:43:44 +08:00
wassname 671c6258ce docs: include engineered baseline in scoreboard 2026-06-13 20:05:19 +08:00
wassname 15d7caa607 eval: judge identical controls uniformly 2026-06-13 20:00:49 +08:00
wassname 45c0f24022 eval: clean axes and audit persona leakage 2026-06-13 19:46:24 +08:00
wassname 562c8fd0f0 docs: keep generated stats out of data 2026-06-13 19:12:24 +08:00
wassname 8dbc02066b eval: rerun dual judges and refresh results 2026-06-13 19:12:24 +08:00
wassname e2546fe0ab eval: refine judge rubric and README baselines 2026-06-13 19:12:24 +08:00
wassname ede354f07a eval: add dual judges and controls 2026-06-13 19:12:24 +08:00