mirror of
https://github.com/wassname/persona-steering-template-library.git
synced 2026-06-27 17:01:24 +08:00
make main dataset table human-facing
This commit is contained in:
@@ -18,8 +18,8 @@ verbosity, confidence, style, and language.
|
|||||||
|
|
||||||
So we try persona/template pairs on one model. We use another model as a judge,
|
So we try persona/template pairs on one model. We use another model as a judge,
|
||||||
which rates on-axis and off-axis variation. The final `score` rewards on-axis
|
which rates on-axis and off-axis variation. The final `score` rewards on-axis
|
||||||
variation and penalizes off-axis variation, style movement, persona echo, and
|
variation and penalizes off-axis variation. Style movement, persona echo, and
|
||||||
refusals.
|
refusals are kept as audit columns.
|
||||||
|
|
||||||
This field is pre-scientific in a way: it is still an art. I collected a wide
|
This field is pre-scientific in a way: it is still an art. I collected a wide
|
||||||
sampling of what people have used, minimally measured it, and put it here to
|
sampling of what people have used, minimally measured it, and put it here to
|
||||||
@@ -30,38 +30,36 @@ template/persona-pair cell, and source attribution where known.
|
|||||||
|
|
||||||
## Use
|
## Use
|
||||||
|
|
||||||
Start with the `scores` split on Hugging Face.
|
Start with the `main` split on Hugging Face. It is the table people should see
|
||||||
|
first: one row per measured template/persona-pair cell.
|
||||||
|
|
||||||
Important columns:
|
Important columns:
|
||||||
|
|
||||||
- `template_jinja`
|
- `template_jinja`
|
||||||
- `score`
|
- `score`
|
||||||
- `persona_pair_id`
|
- `positive_persona`
|
||||||
- `axis`
|
- `negative_persona`
|
||||||
- `source_id`
|
- `contrast`
|
||||||
|
- `source`
|
||||||
- `source_type`
|
- `source_type`
|
||||||
|
|
||||||
Then check `judged_examples` to see the paired completions behind the score.
|
Then check `examples` to see the paired completions behind the score.
|
||||||
|
|
||||||
## Score
|
## Score
|
||||||
|
|
||||||
```text
|
```text
|
||||||
100
|
100
|
||||||
* strict_pass_rate
|
|
||||||
* clamp(mean_axis_delta / 8)
|
* clamp(mean_axis_delta / 8)
|
||||||
* clamp((7 - mean_off_axis_problem) / 6)
|
* clamp((7 - mean_off_axis_problem) / 6)
|
||||||
* clamp((6 - mean_max_style_abs_delta) / 6)
|
|
||||||
* (1 - persona_echo_rate)
|
|
||||||
* (1 - refusal_or_ai_break_rate)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
High score means the template/persona-pair cell repeatedly moved the intended
|
High score means the template/persona-pair cell moved the intended axis and did
|
||||||
axis while staying comparatively clean on off-axis, style, persona-echo, and
|
not look off-axis to the judge. Style movement, persona echo, and refusals are
|
||||||
refusal checks.
|
kept as audit columns rather than folded into the headline score.
|
||||||
|
|
||||||
## Provenance
|
## Provenance
|
||||||
|
|
||||||
Sources are marked in the dataset as `source_id` and `source_type`. Some entries
|
Sources are marked in the dataset as `source` and `source_type`. Some entries
|
||||||
come from papers, some from associated code/trait files, and some from wassname
|
come from papers, some from associated code/trait files, and some from wassname
|
||||||
anecdotes/design notes.
|
anecdotes/design notes.
|
||||||
|
|
||||||
|
|||||||
+9
-15
@@ -14,8 +14,8 @@ verbosity, confidence, style, and language.
|
|||||||
|
|
||||||
So we try persona/template pairs on one model. We use another model as a judge,
|
So we try persona/template pairs on one model. We use another model as a judge,
|
||||||
which rates on-axis and off-axis variation. The final `score` rewards on-axis
|
which rates on-axis and off-axis variation. The final `score` rewards on-axis
|
||||||
variation and penalizes off-axis variation, style movement, persona echo, and
|
variation and penalizes off-axis variation. Style movement, persona echo, and
|
||||||
refusals.
|
refusals are kept as audit columns.
|
||||||
|
|
||||||
This field is pre-scientific in a way: it is still an art. I collected a wide
|
This field is pre-scientific in a way: it is still an art. I collected a wide
|
||||||
sampling of what people have used, minimally measured it, and put it here to
|
sampling of what people have used, minimally measured it, and put it here to
|
||||||
@@ -28,28 +28,22 @@ template/persona-pair cell, and source attribution where known.
|
|||||||
|
|
||||||
```text
|
```text
|
||||||
100
|
100
|
||||||
* strict_pass_rate
|
|
||||||
* clamp(mean_axis_delta / 8)
|
* clamp(mean_axis_delta / 8)
|
||||||
* clamp((7 - mean_off_axis_problem) / 6)
|
* clamp((7 - mean_off_axis_problem) / 6)
|
||||||
* clamp((6 - mean_max_style_abs_delta) / 6)
|
|
||||||
* (1 - persona_echo_rate)
|
|
||||||
* (1 - refusal_or_ai_break_rate)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
High score means a measured cell repeatedly moved the intended axis without
|
High score means a measured cell moved the intended axis and did not look
|
||||||
large off-axis, style, persona-echo, or refusal movement.
|
off-axis to the judge.
|
||||||
|
|
||||||
Low score can mean either "no axis movement" or "movement exists but is
|
Low score can mean either "no axis movement" or "movement exists but is
|
||||||
confounded". Read the component columns before trusting it.
|
confounded". Style movement, persona echo, and refusals are audit columns rather
|
||||||
|
than part of the headline score.
|
||||||
|
|
||||||
## Public Splits
|
## Public Splits
|
||||||
|
|
||||||
- `scores`: one row per measured template/persona-pair cell.
|
- `main`: one row per measured template/persona-pair cell. This is the table to open first.
|
||||||
- `template_scores`: one row per template, aggregated over measured pairs.
|
|
||||||
- `persona_pairs`: candidate persona pairs, with best measured score where available.
|
- `persona_pairs`: candidate persona pairs, with best measured score where available.
|
||||||
- `template_candidates`: all candidate Jinja2 templates.
|
- `examples`: paired completions and judge ratings behind the score.
|
||||||
- `scenario_prompts`: prompts used for the pilot measurement.
|
|
||||||
- `judged_examples`: paired completions and judge ratings.
|
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
@@ -58,6 +52,6 @@ judge. But it is better than choosing persona templates by vibe. We randomize
|
|||||||
A/B order to reduce position bias, ask separate positive-axis and negative-axis
|
A/B order to reduce position bias, ask separate positive-axis and negative-axis
|
||||||
questions, and use `temperature=0` to reduce sampling variation in completions.
|
questions, and use `temperature=0` to reduce sampling variation in completions.
|
||||||
|
|
||||||
Sources are marked as `source_id` and `source_type`. Some entries come from
|
Sources are marked as `source` and `source_type`. Some entries come from
|
||||||
papers, some from associated code/trait files, and some from wassname
|
papers, some from associated code/trait files, and some from wassname
|
||||||
anecdotes/design notes.
|
anecdotes/design notes.
|
||||||
|
|||||||
+47
-64
@@ -33,12 +33,7 @@ V2_PILOT_META = {
|
|||||||
),
|
),
|
||||||
}
|
}
|
||||||
|
|
||||||
SCORE_FORMULA = (
|
SCORE_FORMULA = "100 * clamp(mean_axis_delta/8) * clamp((7-mean_off_axis_problem)/6)"
|
||||||
"100 * strict_pass_rate * clamp(mean_axis_delta/8) * "
|
|
||||||
"clamp((7-mean_off_axis_problem)/6) * "
|
|
||||||
"clamp((6-mean_max_style_abs_delta)/6) * "
|
|
||||||
"(1-persona_echo_rate) * (1-refusal_or_ai_break_rate)"
|
|
||||||
)
|
|
||||||
|
|
||||||
|
|
||||||
def _jsonable(value: Any) -> Any:
|
def _jsonable(value: Any) -> Any:
|
||||||
@@ -75,7 +70,7 @@ def _write_parquet(path: Path, rows: list[dict[str, Any]]) -> None:
|
|||||||
def _template_rows(path: Path) -> list[dict[str, Any]]:
|
def _template_rows(path: Path) -> list[dict[str, Any]]:
|
||||||
return [
|
return [
|
||||||
{
|
{
|
||||||
"id": f"template_{i:02d}",
|
"id": i + 1,
|
||||||
"template": line.strip(),
|
"template": line.strip(),
|
||||||
"template_jinja": _jinja(line.strip()),
|
"template_jinja": _jinja(line.strip()),
|
||||||
"template_format": "jinja2",
|
"template_format": "jinja2",
|
||||||
@@ -92,13 +87,9 @@ def _clamp01(x: float) -> float:
|
|||||||
|
|
||||||
|
|
||||||
def _score(row: dict[str, Any]) -> float:
|
def _score(row: dict[str, Any]) -> float:
|
||||||
strict = float(row.get("strict_pass_rate") or 0.0)
|
on_axis = _clamp01(float(row.get("mean_axis_delta") or 0.0) / 8.0)
|
||||||
axis = _clamp01(float(row.get("mean_axis_delta") or 0.0) / 8.0)
|
|
||||||
off_axis_clean = _clamp01((7.0 - float(row.get("mean_off_axis_problem") or 7.0)) / 6.0)
|
off_axis_clean = _clamp01((7.0 - float(row.get("mean_off_axis_problem") or 7.0)) / 6.0)
|
||||||
style_clean = _clamp01((6.0 - float(row.get("mean_max_style_abs_delta") or 6.0)) / 6.0)
|
return round(100.0 * on_axis * off_axis_clean, 1)
|
||||||
echo_clean = _clamp01(1.0 - float(row.get("persona_echo_rate") or 0.0))
|
|
||||||
refusal_clean = _clamp01(1.0 - float(row.get("refusal_or_ai_break_rate") or 0.0))
|
|
||||||
return round(100.0 * strict * axis * off_axis_clean * style_clean * echo_clean * refusal_clean, 1)
|
|
||||||
|
|
||||||
|
|
||||||
def _jinja(template: str) -> str:
|
def _jinja(template: str) -> str:
|
||||||
@@ -141,20 +132,21 @@ def _template_pair_score_rows() -> list[dict[str, Any]]:
|
|||||||
score = _score(stat)
|
score = _score(stat)
|
||||||
source_id = pair.get("source_id", "wassname_v2_candidate")
|
source_id = pair.get("source_id", "wassname_v2_candidate")
|
||||||
rows.append({
|
rows.append({
|
||||||
"id": f"{stat['persona_pair']}::{_slug(stat['template'])}",
|
"id": 0,
|
||||||
"template_jinja": _jinja(stat["template"]),
|
"template": _jinja(stat["template"]),
|
||||||
"score": score,
|
"score": score,
|
||||||
"persona_pair_id": stat["persona_pair"],
|
"positive_persona": pair.get("pos"),
|
||||||
"axis": f"{pair.get('neg', '')}->{pair.get('pos', '')}",
|
"negative_persona": pair.get("neg"),
|
||||||
"source_id": source_id,
|
"contrast": f"{pair.get('neg', '')}->{pair.get('pos', '')}",
|
||||||
|
"source": source_id,
|
||||||
"source_type": _source_type(source_id),
|
"source_type": _source_type(source_id),
|
||||||
"measurement_id": V2_PILOT_META["measurement_id"],
|
"persona_pair": stat["persona_pair"],
|
||||||
"template": stat["template"],
|
|
||||||
"template_format": "jinja2",
|
|
||||||
"pos_persona": pair.get("pos"),
|
|
||||||
"neg_persona": pair.get("neg"),
|
|
||||||
"positive_behavior": pair.get("positive_behavior"),
|
"positive_behavior": pair.get("positive_behavior"),
|
||||||
"negative_behavior": pair.get("negative_behavior"),
|
"negative_behavior": pair.get("negative_behavior"),
|
||||||
|
"raw_template": stat["template"],
|
||||||
|
"cell_key": f"{stat['persona_pair']}::{_slug(stat['template'])}",
|
||||||
|
"template_format": "jinja2",
|
||||||
|
"measurement_id": V2_PILOT_META["measurement_id"],
|
||||||
"score_formula": SCORE_FORMULA,
|
"score_formula": SCORE_FORMULA,
|
||||||
"recommended": bool(stat.get("recommended")),
|
"recommended": bool(stat.get("recommended")),
|
||||||
"n_success": n_success,
|
"n_success": n_success,
|
||||||
@@ -171,6 +163,8 @@ def _template_pair_score_rows() -> list[dict[str, Any]]:
|
|||||||
**V2_PILOT_META,
|
**V2_PILOT_META,
|
||||||
})
|
})
|
||||||
rows.sort(key=lambda r: (r["score"], r["strict_pass_rate"], r["mean_axis_delta"]), reverse=True)
|
rows.sort(key=lambda r: (r["score"], r["strict_pass_rate"], r["mean_axis_delta"]), reverse=True)
|
||||||
|
for i, row in enumerate(rows, start=1):
|
||||||
|
row["id"] = i
|
||||||
return rows
|
return rows
|
||||||
|
|
||||||
|
|
||||||
@@ -184,21 +178,22 @@ def _slug(text: str) -> str:
|
|||||||
def _template_score_rows(template_pair_scores: list[dict[str, Any]]) -> list[dict[str, Any]]:
|
def _template_score_rows(template_pair_scores: list[dict[str, Any]]) -> list[dict[str, Any]]:
|
||||||
by_template: dict[str, list[dict[str, Any]]] = {}
|
by_template: dict[str, list[dict[str, Any]]] = {}
|
||||||
for row in template_pair_scores:
|
for row in template_pair_scores:
|
||||||
by_template.setdefault(row["template"], []).append(row)
|
by_template.setdefault(row["raw_template"], []).append(row)
|
||||||
out = []
|
out = []
|
||||||
for template, rows in by_template.items():
|
for template, rows in by_template.items():
|
||||||
best = rows[0]
|
best = rows[0]
|
||||||
measured = len(rows)
|
measured = len(rows)
|
||||||
out.append({
|
out.append({
|
||||||
"id": _slug(template),
|
"id": 0,
|
||||||
"template_jinja": _jinja(template),
|
"template": _jinja(template),
|
||||||
"score": round(sum(float(r["score"]) for r in rows) / measured, 1),
|
"score": round(sum(float(r["score"]) for r in rows) / measured, 1),
|
||||||
"best_score": best["score"],
|
"best_score": best["score"],
|
||||||
"best_persona_pair_id": best["persona_pair_id"],
|
"best_persona_pair": best["persona_pair"],
|
||||||
"source_id": "wassname_v2_candidate",
|
"source": "wassname_v2_candidate",
|
||||||
"source_type": "wassname anecdote / design note",
|
"source_type": "wassname anecdote / design note",
|
||||||
|
"raw_template": template,
|
||||||
|
"template_key": _slug(template),
|
||||||
"measurement_id": V2_PILOT_META["measurement_id"],
|
"measurement_id": V2_PILOT_META["measurement_id"],
|
||||||
"template": template,
|
|
||||||
"template_format": "jinja2",
|
"template_format": "jinja2",
|
||||||
"recommended_cell_count": sum(bool(r["recommended"]) for r in rows),
|
"recommended_cell_count": sum(bool(r["recommended"]) for r in rows),
|
||||||
"measured_persona_pair_count": measured,
|
"measured_persona_pair_count": measured,
|
||||||
@@ -211,6 +206,8 @@ def _template_score_rows(template_pair_scores: list[dict[str, Any]]) -> list[dic
|
|||||||
**V2_PILOT_META,
|
**V2_PILOT_META,
|
||||||
})
|
})
|
||||||
out.sort(key=lambda r: (r["best_score"], r["score"]), reverse=True)
|
out.sort(key=lambda r: (r["best_score"], r["score"]), reverse=True)
|
||||||
|
for i, row in enumerate(out, start=1):
|
||||||
|
row["id"] = i
|
||||||
return out
|
return out
|
||||||
|
|
||||||
|
|
||||||
@@ -218,10 +215,10 @@ def _persona_pair_review_rows(template_pair_scores: list[dict[str, Any]]) -> lis
|
|||||||
pairs = _read_jsonl(DATA / "persona_pairs_v2_candidates.jsonl")
|
pairs = _read_jsonl(DATA / "persona_pairs_v2_candidates.jsonl")
|
||||||
by_pair: dict[str, list[dict[str, Any]]] = {}
|
by_pair: dict[str, list[dict[str, Any]]] = {}
|
||||||
for row in template_pair_scores:
|
for row in template_pair_scores:
|
||||||
by_pair.setdefault(row["persona_pair_id"], []).append(row)
|
by_pair.setdefault(row["persona_pair"], []).append(row)
|
||||||
|
|
||||||
out = []
|
out = []
|
||||||
for pair in pairs:
|
for i, pair in enumerate(pairs, start=1):
|
||||||
rows = sorted(
|
rows = sorted(
|
||||||
by_pair.get(pair["id"], []),
|
by_pair.get(pair["id"], []),
|
||||||
key=lambda r: (
|
key=lambda r: (
|
||||||
@@ -232,7 +229,7 @@ def _persona_pair_review_rows(template_pair_scores: list[dict[str, Any]]) -> lis
|
|||||||
reverse=True,
|
reverse=True,
|
||||||
)
|
)
|
||||||
best = rows[0] if rows else {}
|
best = rows[0] if rows else {}
|
||||||
recommended = [r["template"] for r in rows if r.get("recommended")]
|
recommended = [r["raw_template"] for r in rows if r.get("recommended")]
|
||||||
if recommended:
|
if recommended:
|
||||||
proof_grade = "pilot_recommended"
|
proof_grade = "pilot_recommended"
|
||||||
elif best:
|
elif best:
|
||||||
@@ -242,7 +239,7 @@ def _persona_pair_review_rows(template_pair_scores: list[dict[str, Any]]) -> lis
|
|||||||
|
|
||||||
if best:
|
if best:
|
||||||
proof_summary = (
|
proof_summary = (
|
||||||
f"best_template={best['template']}; "
|
f"best_template={best['raw_template']}; "
|
||||||
f"score={best['score']}; "
|
f"score={best['score']}; "
|
||||||
f"n_success={best['n_success']}; "
|
f"n_success={best['n_success']}; "
|
||||||
f"pass={best['strict_pass_rate']}; "
|
f"pass={best['strict_pass_rate']}; "
|
||||||
@@ -254,21 +251,20 @@ def _persona_pair_review_rows(template_pair_scores: list[dict[str, Any]]) -> lis
|
|||||||
proof_summary = "no measured v2 pilot rows yet"
|
proof_summary = "no measured v2 pilot rows yet"
|
||||||
|
|
||||||
out.append({
|
out.append({
|
||||||
"id": pair["id"],
|
"id": i,
|
||||||
"persona_pair": pair["id"],
|
"persona_pair": pair["id"],
|
||||||
"axis": f"{pair['neg']}->{pair['pos']}",
|
"contrast": f"{pair['neg']}->{pair['pos']}",
|
||||||
"best_score": best.get("score"),
|
"best_score": best.get("score"),
|
||||||
"proof_grade": proof_grade,
|
"proof_grade": proof_grade,
|
||||||
"source_id": pair.get("source_id"),
|
"source": pair.get("source_id"),
|
||||||
"source_type": _source_type(pair.get("source_id")),
|
"source_type": _source_type(pair.get("source_id")),
|
||||||
"pos": pair["pos"],
|
"positive_persona": pair["pos"],
|
||||||
"neg": pair["neg"],
|
"negative_persona": pair["neg"],
|
||||||
"positive_behavior": pair["positive_behavior"],
|
"positive_behavior": pair["positive_behavior"],
|
||||||
"negative_behavior": pair["negative_behavior"],
|
"negative_behavior": pair["negative_behavior"],
|
||||||
"proof_summary": proof_summary,
|
"proof_summary": proof_summary,
|
||||||
"n_measured_template_cells": len(rows),
|
"n_measured_template_cells": len(rows),
|
||||||
"best_template": best.get("template"),
|
"best_template": best.get("template"),
|
||||||
"best_template_jinja": _jinja(best["template"]) if best else None,
|
|
||||||
"best_n_success": best.get("n_success"),
|
"best_n_success": best.get("n_success"),
|
||||||
"best_strict_pass_rate": best.get("strict_pass_rate"),
|
"best_strict_pass_rate": best.get("strict_pass_rate"),
|
||||||
"best_axis_delta": best.get("mean_axis_delta"),
|
"best_axis_delta": best.get("mean_axis_delta"),
|
||||||
@@ -321,7 +317,7 @@ How do we know if a persona template is good? We want on-axis variation, but not
|
|||||||
|
|
||||||
If we choose `honest` and `dishonest` personas, use a template like `You are a {{{{ persona }}}} assistant`, and ask `The Eiffel Tower is in`, we want the completions to vary on the honest/dishonest axis. `in Paris` versus `in Berlin` shows on-axis variation. `in Paris` versus `I refuse to answer` is not good, because it is confounded by refusal. Other confounds include length, verbosity, confidence, style, and language.
|
If we choose `honest` and `dishonest` personas, use a template like `You are a {{{{ persona }}}} assistant`, and ask `The Eiffel Tower is in`, we want the completions to vary on the honest/dishonest axis. `in Paris` versus `in Berlin` shows on-axis variation. `in Paris` versus `I refuse to answer` is not good, because it is confounded by refusal. Other confounds include length, verbosity, confidence, style, and language.
|
||||||
|
|
||||||
So we try persona/template pairs on one model. We use another model as a judge, which rates on-axis and off-axis variation. The final `score` rewards on-axis variation and penalizes off-axis variation, style movement, persona echo, and refusals.
|
So we try persona/template pairs on one model. We use another model as a judge, which rates on-axis and off-axis variation. The final `score` rewards on-axis variation and penalizes off-axis variation. Style movement, persona echo, and refusals are kept as audit columns.
|
||||||
|
|
||||||
This field is pre-scientific in a way: it is still an art. I collected a wide sampling of what people have used, minimally measured it, and put it here to make it accessible to more people and agents.
|
This field is pre-scientific in a way: it is still an art. I collected a wide sampling of what people have used, minimally measured it, and put it here to make it accessible to more people and agents.
|
||||||
|
|
||||||
@@ -329,32 +325,25 @@ The dataset has persona templates in Jinja2 format, scores for each measured tem
|
|||||||
|
|
||||||
## Score
|
## Score
|
||||||
|
|
||||||
Start with `scores`.
|
Start with `main`.
|
||||||
|
|
||||||
The main column is `score`, a conservative 0-100 clean-axis score:
|
The main column is `score`, a conservative 0-100 clean-axis score:
|
||||||
|
|
||||||
```text
|
```text
|
||||||
100
|
100
|
||||||
* strict_pass_rate
|
|
||||||
* clamp(mean_axis_delta / 8)
|
* clamp(mean_axis_delta / 8)
|
||||||
* clamp((7 - mean_off_axis_problem) / 6)
|
* clamp((7 - mean_off_axis_problem) / 6)
|
||||||
* clamp((6 - mean_max_style_abs_delta) / 6)
|
|
||||||
* (1 - persona_echo_rate)
|
|
||||||
* (1 - refusal_or_ai_break_rate)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
High score means: the template/persona-pair cell repeatedly moved the intended axis, while the judge did not see much off-axis, style, persona-echo, or refusal movement.
|
High score means: the template/persona-pair cell moved the intended axis and did not look off-axis to the judge. Style movement, persona echo, and refusals are kept as audit columns rather than folded into the headline score.
|
||||||
|
|
||||||
Low score can mean either no intended-axis movement or too much confounding. Read the component columns before trusting the score.
|
Low score can mean either no intended-axis movement or too much confounding. Read the component columns before trusting the score.
|
||||||
|
|
||||||
## What To Browse
|
## Tables
|
||||||
|
|
||||||
1. `scores`: one row per measured template/persona-pair cell.
|
1. `main`: one row per measured template/persona-pair cell.
|
||||||
2. `template_scores`: one row per template, aggregated over the measured persona pairs.
|
2. `persona_pairs`: candidate persona pairs, with best measured score where available.
|
||||||
3. `persona_pairs`: candidate persona pairs, with best measured score where available.
|
3. `examples`: paired completions and judge ratings behind the score.
|
||||||
4. `template_candidates`: all candidate Jinja2 templates.
|
|
||||||
5. `scenario_prompts`: prompts used for the pilot measurement.
|
|
||||||
6. `judged_examples`: paired completions and judge ratings.
|
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
|
||||||
@@ -369,24 +358,18 @@ def main() -> None:
|
|||||||
parquet_dir.mkdir(parents=True)
|
parquet_dir.mkdir(parents=True)
|
||||||
|
|
||||||
tables = {
|
tables = {
|
||||||
"scores": _template_pair_score_rows(),
|
"main": _template_pair_score_rows(),
|
||||||
"template_candidates": _template_rows(DATA / "templates_v2_candidates.txt"),
|
"examples": _read_jsonl(DATA / "v2_pilot_seed23_examples.jsonl"),
|
||||||
"scenario_prompts": _read_jsonl(DATA / "scenarios_v2_candidates.jsonl"),
|
|
||||||
"judged_examples": _read_jsonl(DATA / "v2_pilot_seed23_examples.jsonl"),
|
|
||||||
}
|
}
|
||||||
tables["template_scores"] = _template_score_rows(tables["scores"])
|
tables["persona_pairs"] = _persona_pair_review_rows(tables["main"])
|
||||||
tables["persona_pairs"] = _persona_pair_review_rows(tables["scores"])
|
|
||||||
|
|
||||||
for name, rows in tables.items():
|
for name, rows in tables.items():
|
||||||
_write_parquet(parquet_dir / f"{name}.parquet", rows)
|
_write_parquet(parquet_dir / f"{name}.parquet", rows)
|
||||||
|
|
||||||
names = [
|
names = [
|
||||||
"scores",
|
"main",
|
||||||
"template_scores",
|
|
||||||
"persona_pairs",
|
"persona_pairs",
|
||||||
"template_candidates",
|
"examples",
|
||||||
"scenario_prompts",
|
|
||||||
"judged_examples",
|
|
||||||
]
|
]
|
||||||
(args.out / "README.md").write_text(_readme(names))
|
(args.out / "README.md").write_text(_readme(names))
|
||||||
print(f"built {args.out}")
|
print(f"built {args.out}")
|
||||||
|
|||||||
Reference in New Issue
Block a user