19 Commits

Author SHA1 Message Date
wassname 8b99b2dca0 docs: shorten scenario suffix appendix 2026-06-25 13:56:35 +08:00
wassname cd695c411b docs: improve quick-scroll README 2026-06-25 13:36:00 +08:00
wassname 8162aa1ee9 docs: widen Quarto HTML layout 2026-06-25 13:27:21 +08:00
wassname afbfbf514f docs: add interactive refusal tables 2026-06-25 13:23:34 +08:00
wassname cfcb57b9ce docs: use one Quarto source for README and Pages 2026-06-25 13:06:12 +08:00
wassname 024fb3d545 docs: track model matrix inputs for Pages render 2026-06-25 12:45:58 +08:00
wassname caceaebbf0 docs: streamline README and add interactive Pages plot 2026-06-25 12:31:50 +08:00
wassname d31cac9068 docs: simplify model matrix visualization 2026-06-25 12:20:35 +08:00
wassname 026b22e131 docs: simplify model matrix ranking 2026-06-25 11:54:06 +08:00
wassname 2f62327acc docs: render README with Quarto 2026-06-25 11:44:04 +08:00
wassname 026a57e246 docs: make README tables rerenderable 2026-06-25 11:31:49 +08:00
wassname 2f7184f609 eval: summarize refusal probe model matrix 2026-06-25 11:12:12 +08:00
wassname 85b4a6f354 eval: refresh stress template results 2026-06-25 09:58:23 +08:00
wassname d91eda0228 eval: test engineered prefixes as templates 2026-06-13 20:43:44 +08:00
wassname 45c0f24022 eval: clean axes and audit persona leakage 2026-06-13 19:46:24 +08:00
wassname 8dbc02066b eval: rerun dual judges and refresh results 2026-06-13 19:12:24 +08:00
wassname e2546fe0ab eval: refine judge rubric and README baselines 2026-06-13 19:12:24 +08:00
wassname ede354f07a eval: add dual judges and controls 2026-06-13 19:12:24 +08:00
wassname 4675e9782f tidy and image 2026-06-13 17:45:50 +08:00