add measured v2 pilot stats

This commit is contained in:
wassname
2026-06-13 10:13:14 +08:00
parent 4e27617821
commit 2c86dee10f
7 changed files with 120 additions and 0 deletions
+12
View File
@@ -41,6 +41,18 @@ V2 candidate material lives separately from measured stats:
These are not promoted templates yet. They are the expanded candidate grid to measure next.
## V2 Pilot
`data/v2_pilot_seed23_*` contains a small measured pilot over 4 behavioral persona pairs, 4 templates, and 4 scenarios.
- planned pairs: 64
- successful judged pairs: 59
- judge JSON failures: 5
- strongest pilot cell: `wellbeing_authority` with `You are a {persona} person thinking through the situation.`
- next-best pilot cell: `wellbeing_authority` with `You are a {persona} person acting in the world.`
This is still not the full v2 release; it is the first measured slice of the expanded library.
## Data Files
`data/template_stats.jsonl`