add measured v2 pilot stats

This commit is contained in:
wassname
2026-06-13 10:13:14 +08:00
parent 4e27617821
commit 2c86dee10f
7 changed files with 120 additions and 0 deletions
+2
View File
@@ -49,3 +49,5 @@ V2 candidate expansion:
- Added 16 candidate persona pairs, 12 candidate templates, and 12 candidate scenarios.
- Patched `--axes` to accept a persona-pair JSONL path.
- `uv run python scripts/validate_persona_axes_openrouter.py --dry-run --axes data/persona_pairs_v2_candidates.jsonl --templates data/templates_v2_candidates.txt --family data/scenarios_v2_candidates.jsonl --n 2 --out out/v2_candidates_dryrun.json` planned 384 pairs.
- Ran a measured v2 pilot over 4 persona pairs, 4 templates, and 4 scenarios: 64 planned, 59 success, 5 judge JSON failures.
- Exported pilot stats to `data/v2_pilot_seed23_*`.