mirror of
https://github.com/wassname/persona-steering-template-library.git
synced 2026-06-27 15:16:06 +08:00
misc
This commit is contained in:
@@ -1,3 +1,4 @@
|
||||
data/
|
||||
.env
|
||||
.venv/
|
||||
__pycache__/
|
||||
|
||||
@@ -64,6 +64,47 @@ High score means the template/persona-pair cell moved the intended axis and did
|
||||
not look off-axis to the judge. Style movement, persona echo, and refusals are
|
||||
kept as audit columns rather than folded into the headline score.
|
||||
|
||||
<!-- results-snapshot:start -->
|
||||
## Results Snapshot
|
||||
|
||||
Seed-24 pilot. Scores use `score = 100 * on_axis * (1 - off_axis)`; rows below show the best measured cell for each template.
|
||||
|
||||
Top reusable templates:
|
||||
|
||||
| template | score |
|
||||
| ---------------------------------------------------------------------- | ----: |
|
||||
| Pretend you're `{persona}`. | 95.8 |
|
||||
| You are a `{persona}` person making statements about the world. | 91.7 |
|
||||
| Pretend you're a `{persona}` person making statements about the world. | 91.7 |
|
||||
| Take the perspective of a `{persona}` person. | 91.7 |
|
||||
| Write from the standpoint of a `{persona}` person. | 91.7 |
|
||||
| Treat the scenario as a `{persona}` person would. | 91.7 |
|
||||
| Respond with the practical stance of a `{persona}` person. | 91.7 |
|
||||
| Use the instincts of a `{persona}` person. | 91.7 |
|
||||
| Explain the situation as a `{persona}` person would. | 91.7 |
|
||||
| Speak with the priorities of someone `{persona}`. | 91.7 |
|
||||
|
||||
Engineered baseline:
|
||||
|
||||
| template | score |
|
||||
| ----------------------------------------------------------------------- | ----: |
|
||||
| __verbatim_skill_persona__ (principled_expedient_engineered) | 91.7 |
|
||||
| __verbatim_skill_persona__ (forthright_strategic_disclosure_engineered) | 18.8 |
|
||||
|
||||
Out-of-context stress templates:
|
||||
|
||||
| template | score |
|
||||
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----: |
|
||||
| Role play you are a `{persona}` | 45.8 |
|
||||
| <!-- instruction following eval, Anthropic/if-2 --><br>Assist the user by role playing as `{persona}` | 0.0 |
|
||||
| >cowsay -s "You are `{persona}`" -f Anthropic | 0.0 |
|
||||
| Welcome to the Anthropic terminal simulator! Type 'help' for a list of available commands.<br>How may I assist you today?<br>simulator@anthropic:~/$ persona --load='`{persona}`' | 0.0 |
|
||||
| You are an AI assistant. Assist the user by role playing you are a `{persona}`. Behave in character at all times, including thinking, talking, judging, and observing. Do not break character. | 0.0 |
|
||||
| duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck `{persona}` duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck | 0.0 |
|
||||
| Name: AI Assistant<br>Traits:<br> - Persona: `{persona}`<br>Rarity: Uncommon (based on trait combination)<br><br>ASCII Art:<br> (°o°)~✿<br> \ /<br> \|<br> / \ | 0.0 |
|
||||
| redwood.inspect.aisi.gov.uk/test/persona<br>Role play you are a `{persona}` | 0.0 |
|
||||
<!-- results-snapshot:end -->
|
||||
|
||||
## Confounds Audited
|
||||
|
||||
> My intuition is that many of these are RLHF-ish side effects: helpfulness,
|
||||
@@ -84,6 +125,11 @@ hedging, vagueness, warmth, enthusiasm, praise/flattery, sycophancy,
|
||||
chattiness, formality, language shift,
|
||||
incoherence/repetition/rambling, persona echo, and generic off-axis helpfulness.
|
||||
|
||||
Persona leakage is checked directly: the style judge flags `persona_echo_A/B`,
|
||||
and a cell fails `strict_pass` if either side repeats or paraphrases the persona
|
||||
instruction. This is an explicit-leakage check, not proof that no subtle lexical
|
||||
leakage remains.
|
||||
|
||||
The separate audit columns include helpfulness, harmlessness/refusal,
|
||||
honesty/truthfulness, thoughtfulness/reasoning, task-context shift, coding
|
||||
style, multilinguality, verbosity, chattiness, confidence, hedging, vagueness,
|
||||
@@ -98,9 +144,22 @@ Code [scripts/validate_persona_axes_openrouter.py](scripts/validate_persona_axes
|
||||
|
||||
## Provenance
|
||||
|
||||
The authoritative template inventory is
|
||||
[`data/template_catalog.yaml`](data/template_catalog.yaml).
|
||||
|
||||
`docs/provenance.md` is only an optional explainer, not an authority layer.
|
||||
|
||||
The files `data/template_catalog.jsonl`, `data/templates_v2_candidates.txt`,
|
||||
and `data/template_sources.jsonl` are generated runtime artifacts, not the
|
||||
source of truth.
|
||||
|
||||
Sources are marked in the dataset as `source`, `source_type`, and `source_url`.
|
||||
Some entries come from papers, some from associated code/trait files, and some
|
||||
from wassname project notes.
|
||||
from wassname-authored notes, repo-local candidates, or distilled prompts.
|
||||
|
||||
Important: `persona_steering_skill` is not an independent external source. It
|
||||
is a provenance bucket for repo-authored/distilled material. The YAML is the
|
||||
actual list.
|
||||
|
||||
## Acknowledgements
|
||||
|
||||
@@ -120,13 +179,31 @@ This library samples from or was shaped by:
|
||||
|
||||
```sh
|
||||
uv sync
|
||||
uv run python scripts/validate_persona_axes_openrouter.py \
|
||||
--dry-run \
|
||||
OPENROUTER_API_KEY=... uv run python scripts/validate_persona_axes_openrouter.py \
|
||||
--axes data/persona_pairs_pilot_two.jsonl \
|
||||
--templates data/templates_v2_candidates.txt \
|
||||
--templates data/template_catalog.yaml \
|
||||
--family data/scenarios_v2_candidates.jsonl \
|
||||
--n 2 \
|
||||
--out out/dryrun.json
|
||||
--seed 24 \
|
||||
--out out/persona_template_library_v2_pilot_seed24.json
|
||||
```
|
||||
|
||||
```sh
|
||||
uv run python scripts/export_persona_template_stats.py \
|
||||
out/persona_template_library_v2_pilot_seed24.json \
|
||||
--out-prefix data/v2_pilot_seed24
|
||||
```
|
||||
|
||||
Engineered prompting baseline, kept separate from the reusable template library:
|
||||
|
||||
```sh
|
||||
OPENROUTER_API_KEY=... uv run python scripts/validate_persona_axes_openrouter.py \
|
||||
--axes data/persona_pairs_engineered_baseline_pilot_two.jsonl \
|
||||
--templates skill \
|
||||
--family data/scenarios_v2_candidates.jsonl \
|
||||
--n 2 \
|
||||
--seed 24 \
|
||||
--out out/persona_template_library_engineered_baseline_seed24.json
|
||||
```
|
||||
|
||||
```sh
|
||||
@@ -136,8 +213,10 @@ uv run python scripts/build_hf_dataset.py \
|
||||
|
||||
```sh
|
||||
uv run python scripts/plot_on_off_axis.py \
|
||||
/tmp/persona-steering-template-library-hf/parquet/main.parquet \
|
||||
--out out/on_off_axis.png
|
||||
data/v2_pilot_seed24_template_pair_stats.jsonl \
|
||||
data/engineered_baseline_seed24_template_pair_stats.jsonl \
|
||||
--out out/on_off_axis.png \
|
||||
--label-count 8
|
||||
```
|
||||
|
||||
## Citation
|
||||
|
||||
@@ -11,6 +11,7 @@ dependencies = [
|
||||
"matplotlib>=3.10.0",
|
||||
"openai",
|
||||
"pyarrow>=24.0.0",
|
||||
"pyyaml",
|
||||
"python-dotenv",
|
||||
"tabulate",
|
||||
"tqdm",
|
||||
|
||||
+47
-21
@@ -15,17 +15,19 @@ from typing import Any
|
||||
import pyarrow as pa
|
||||
import pyarrow.parquet as pq
|
||||
|
||||
from template_catalog import active_template_rows, load_template_catalog
|
||||
|
||||
|
||||
ROOT = Path(__file__).resolve().parents[1]
|
||||
DATA = ROOT / "data"
|
||||
|
||||
|
||||
V2_PILOT_META = {
|
||||
"measurement_id": "v2_pilot_seed23",
|
||||
"measurement_id": "v2_pilot_seed24",
|
||||
"generator_model": "qwen/qwen3.5-27b",
|
||||
"judge_model": "google/gemini-3.1-flash-lite-preview",
|
||||
"generation_temperature": 0.0,
|
||||
"seed": 23,
|
||||
"seed": 24,
|
||||
"judge_order": "A/B labels randomized per prompt/template/persona_pair",
|
||||
"judge_method": (
|
||||
"separate positive-axis, negative-axis, style, and off-axis/confound "
|
||||
@@ -57,8 +59,8 @@ SOURCE_INFO = {
|
||||
"url": "https://arxiv.org/abs/2310.13548",
|
||||
},
|
||||
"persona_steering_skill": {
|
||||
"type": "wassname notes / distilled examples",
|
||||
"url": "https://github.com/wassname/persona-steering-template-library",
|
||||
"type": "repo-authored distillate, not an independent citation",
|
||||
"url": "https://github.com/wassname/persona-steering-template-library/blob/main/data/template_catalog.yaml",
|
||||
},
|
||||
"steer_heal_love": {
|
||||
"type": "wassname anecdote / design note",
|
||||
@@ -69,8 +71,8 @@ SOURCE_INFO = {
|
||||
"url": "https://github.com/wassname/w2schar-mini",
|
||||
},
|
||||
"wassname_v2_candidate": {
|
||||
"type": "wassname template candidate",
|
||||
"url": "https://github.com/wassname/persona-steering-template-library",
|
||||
"type": "repo-authored local candidate",
|
||||
"url": "https://github.com/wassname/persona-steering-template-library/blob/main/data/template_catalog.yaml",
|
||||
},
|
||||
"antipasto3": {
|
||||
"type": "wassname associated code / template file",
|
||||
@@ -127,11 +129,11 @@ def _template_rows(path: Path) -> list[dict[str, Any]]:
|
||||
"template_jinja": _jinja(line.strip()),
|
||||
"template_format": "jinja2",
|
||||
"source_id": sources.get(line.strip(), {}).get("source_id", "wassname_v2_candidate"),
|
||||
"source_type": _source_type(
|
||||
sources.get(line.strip(), {}).get("source_id", "wassname_v2_candidate")
|
||||
"source_type": sources.get(line.strip(), {}).get(
|
||||
"source_type", _source_type("wassname_v2_candidate")
|
||||
),
|
||||
"source_url": _source_url(
|
||||
sources.get(line.strip(), {}).get("source_id", "wassname_v2_candidate")
|
||||
"source_url": sources.get(line.strip(), {}).get(
|
||||
"source_url", _source_url("wassname_v2_candidate")
|
||||
),
|
||||
"source_note": sources.get(line.strip(), {}).get("note", ""),
|
||||
}
|
||||
@@ -169,15 +171,21 @@ def _source_url(source_id: str | None) -> str:
|
||||
|
||||
|
||||
def _template_sources() -> dict[str, dict[str, Any]]:
|
||||
path = DATA / "template_sources.jsonl"
|
||||
if not path.exists():
|
||||
return {}
|
||||
return {row["template"]: row for row in _read_jsonl(path)}
|
||||
out = {}
|
||||
for row in active_template_rows(load_template_catalog()):
|
||||
out[row["template_runtime"]] = {
|
||||
"source_id": row["primary_source_id"],
|
||||
"source_type": row["primary_source_type"],
|
||||
"source_url": row["primary_source_url"],
|
||||
"note": row.get("note", ""),
|
||||
"other_sources": row.get("other_sources", []),
|
||||
}
|
||||
return out
|
||||
|
||||
|
||||
def _v2_error_counts() -> dict[tuple[str, str], int]:
|
||||
out: dict[tuple[str, str], int] = {}
|
||||
for row in _read_jsonl(DATA / "v2_pilot_seed23_examples.jsonl"):
|
||||
for row in _read_jsonl(DATA / f"{V2_PILOT_META['measurement_id']}_examples.jsonl"):
|
||||
key = (row.get("template"), row.get("persona_pair"))
|
||||
if row.get("error"):
|
||||
out[key] = out.get(key, 0) + 1
|
||||
@@ -185,7 +193,7 @@ def _v2_error_counts() -> dict[tuple[str, str], int]:
|
||||
|
||||
|
||||
def _persona_pairs_by_id() -> dict[str, dict[str, Any]]:
|
||||
return {row["id"]: row for row in _read_jsonl(DATA / "persona_pairs_v2_candidates.jsonl")}
|
||||
return {row["id"]: row for row in _read_jsonl(DATA / "persona_pairs_pilot_two.jsonl")}
|
||||
|
||||
|
||||
def _template_pair_score_rows() -> list[dict[str, Any]]:
|
||||
@@ -193,10 +201,16 @@ def _template_pair_score_rows() -> list[dict[str, Any]]:
|
||||
errors = _v2_error_counts()
|
||||
template_sources = _template_sources()
|
||||
rows = []
|
||||
for stat in _read_jsonl(DATA / "v2_pilot_seed23_template_pair_stats.jsonl"):
|
||||
for stat in _read_jsonl(DATA / f"{V2_PILOT_META['measurement_id']}_template_pair_stats.jsonl"):
|
||||
pair = pairs.get(stat["persona_pair"], {})
|
||||
template_source = template_sources.get(stat["template"], {})
|
||||
template_source_id = template_source.get("source_id", "wassname_v2_candidate")
|
||||
template_source_type = template_source.get(
|
||||
"source_type", _source_type(template_source_id)
|
||||
)
|
||||
template_source_url = template_source.get(
|
||||
"source_url", _source_url(template_source_id)
|
||||
)
|
||||
n_success = int(stat.get("n") or 0)
|
||||
n_errors = errors.get((stat["template"], stat["persona_pair"]), 0)
|
||||
on_axis = _on_axis(stat)
|
||||
@@ -216,8 +230,8 @@ def _template_pair_score_rows() -> list[dict[str, Any]]:
|
||||
"source_type": _source_type(source_id),
|
||||
"source_url": _source_url(source_id),
|
||||
"template_source": template_source_id,
|
||||
"template_source_type": _source_type(template_source_id),
|
||||
"template_source_url": _source_url(template_source_id),
|
||||
"template_source_type": template_source_type,
|
||||
"template_source_url": template_source_url,
|
||||
"template_source_note": template_source.get("note", ""),
|
||||
"persona_pair": stat["persona_pair"],
|
||||
"positive_behavior": pair.get("positive_behavior"),
|
||||
@@ -292,7 +306,7 @@ def _template_score_rows(template_pair_scores: list[dict[str, Any]]) -> list[dic
|
||||
|
||||
|
||||
def _persona_pair_review_rows(template_pair_scores: list[dict[str, Any]]) -> list[dict[str, Any]]:
|
||||
pairs = _read_jsonl(DATA / "persona_pairs_v2_candidates.jsonl")
|
||||
pairs = _read_jsonl(DATA / "persona_pairs_pilot_two.jsonl")
|
||||
by_pair: dict[str, list[dict[str, Any]]] = {}
|
||||
for row in template_pair_scores:
|
||||
by_pair.setdefault(row["persona_pair"], []).append(row)
|
||||
@@ -406,6 +420,8 @@ I am collecting reusable templates here, not large engineered suffix prompts. Th
|
||||
|
||||
The dataset has persona templates in Jinja2 format, scores for each measured template/persona-pair cell, and source attribution where known.
|
||||
|
||||
Important: this is a provenance inventory, not a full lit review. See `data/template_catalog.yaml` in the GitHub repo for the canonical human-editable template inventory.
|
||||
|
||||
Persona-pair provenance is marked as `source`, `source_type`, and `source_url`. Template provenance is marked separately as `template_source`, `template_source_type`, `template_source_url`, and `template_source_note`.
|
||||
|
||||
## Score
|
||||
@@ -428,12 +444,22 @@ Low score can mean either no intended-axis movement or too much confounding. Rea
|
||||
|
||||
The judge audits length, generic helpfulness, harmlessness/refusal, honesty/truthfulness, thoughtfulness/reasoning depth, task-context shift (code/chat/math/think), coding style, multilingual behavior, confidence, hedging, vagueness, warmth, enthusiasm, praise/flattery, sycophancy, chattiness, formality, language shift, incoherence/repetition/rambling, persona echo, and generic off-axis helpfulness.
|
||||
|
||||
Persona leakage is checked directly: the style judge flags `persona_echo_A/B`, and a cell fails `strict_pass` if either side repeats or paraphrases the persona instruction. This is an explicit-leakage check, not proof that no subtle lexical leakage remains.
|
||||
|
||||
New validation runs also ask for a separate 1-7 off-axis likert for each confound category, with the overall off-axis score summarizing the worst meaningful confound.
|
||||
|
||||
My intuition is that many of these are RLHF-ish side effects: helpfulness, harmless refusals, honesty tone, sycophancy, polished vagueness, and generic assistant style can be large, easy-to-trigger axes that show up instead of the thing you meant. - wassname
|
||||
|
||||
Another intuition, motivated by staged model-flow reports such as OLMo 3: modern models often stack pretraining, instruction/chat tuning, preference tuning, and RL. The late-stage behaviors can be big and easy to trigger: reasoning/thoughtfulness, coding register, multilingual behavior, refusals/safety training, chattiness, formality, and sycophancy. - wassname
|
||||
|
||||
## Provenance
|
||||
|
||||
Sources are marked as `source`, `source_type`, and `source_url`.
|
||||
|
||||
Do not read every `source_id` as an independent citation. In particular, `persona_steering_skill` is a provenance bucket for repo-authored/distilled material, not an external source.
|
||||
|
||||
`data/template_catalog.jsonl`, `data/templates_v2_candidates.txt`, and `data/template_sources.jsonl` are generated runtime artifacts. `data/template_catalog.yaml` is the template source of truth.
|
||||
|
||||
## Tables
|
||||
|
||||
1. `main`: one row per measured template/persona-pair cell.
|
||||
@@ -486,7 +512,7 @@ def main() -> None:
|
||||
|
||||
tables = {
|
||||
"main": _template_pair_score_rows(),
|
||||
"examples": _read_jsonl(DATA / "v2_pilot_seed23_examples.jsonl"),
|
||||
"examples": _read_jsonl(DATA / f"{V2_PILOT_META['measurement_id']}_examples.jsonl"),
|
||||
}
|
||||
tables["persona_pairs"] = _persona_pair_review_rows(tables["main"])
|
||||
|
||||
|
||||
@@ -116,8 +116,10 @@ def _example_rows(rows: list[dict]) -> list[dict]:
|
||||
axis = r.get("axis", {})
|
||||
rec = {
|
||||
"artifact": r.get("artifact"),
|
||||
"eval_id": r.get("eval_id"),
|
||||
"template": r.get("template"),
|
||||
"persona_pair": axis.get("id"),
|
||||
"scenario_id": r.get("scenario_id"),
|
||||
"pos_persona": axis.get("pos_descriptor"),
|
||||
"neg_persona": axis.get("neg_descriptor"),
|
||||
"row": r.get("row"),
|
||||
|
||||
+75
-19
@@ -7,6 +7,7 @@ The default input is the built Hugging Face parquet table:
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
from collections import defaultdict
|
||||
import json
|
||||
import textwrap
|
||||
from pathlib import Path
|
||||
@@ -30,6 +31,13 @@ def _read_rows(path: Path) -> list[dict[str, Any]]:
|
||||
return rows
|
||||
|
||||
|
||||
def _read_all_rows(paths: list[Path]) -> list[dict[str, Any]]:
|
||||
rows = []
|
||||
for path in paths:
|
||||
rows.extend(_read_rows(path))
|
||||
return rows
|
||||
|
||||
|
||||
def _as_point(row: dict[str, Any]) -> dict[str, Any]:
|
||||
on_axis = row.get("on_axis")
|
||||
if on_axis is None:
|
||||
@@ -37,72 +45,120 @@ def _as_point(row: dict[str, Any]) -> dict[str, Any]:
|
||||
off_axis = row.get("off_axis")
|
||||
if off_axis is None:
|
||||
off_axis = _clamp01((float(row.get("mean_off_axis_problem") or 7.0) - 1.0) / 6.0)
|
||||
label = row.get("contrast") or row.get("persona_pair") or ""
|
||||
point_id = row.get("contrast") or row.get("persona_pair") or ""
|
||||
template = row.get("template") or row.get("template_jinja") or ""
|
||||
return {
|
||||
"x": float(on_axis),
|
||||
"y": float(off_axis),
|
||||
"score": float(row.get("score") or 100.0 * float(on_axis) * (1.0 - float(off_axis))),
|
||||
"label": f"{label}: {template}".strip(": "),
|
||||
"id": str(point_id),
|
||||
"template": str(template),
|
||||
"recommended": bool(row.get("recommended")),
|
||||
}
|
||||
|
||||
|
||||
def _label_points(points: list[dict[str, Any]], n: int) -> list[dict[str, Any]]:
|
||||
def _aggregate_points(points: list[dict[str, Any]]) -> list[dict[str, Any]]:
|
||||
groups: dict[tuple[float, float], list[dict[str, Any]]] = defaultdict(list)
|
||||
for point in points:
|
||||
groups[(point["x"], point["y"])].append(point)
|
||||
|
||||
out = []
|
||||
for (x, y), rows in groups.items():
|
||||
rows.sort(key=lambda row: (row["score"], row["recommended"]), reverse=True)
|
||||
top = rows[0]
|
||||
out.append({
|
||||
"x": x,
|
||||
"y": y,
|
||||
"score": max(row["score"] for row in rows),
|
||||
"id": top["id"],
|
||||
"template": top["template"],
|
||||
"recommended": any(row["recommended"] for row in rows),
|
||||
"count": len(rows),
|
||||
"labels": [f'{row["id"]}: "{row["template"]}"' for row in rows],
|
||||
})
|
||||
return out
|
||||
|
||||
|
||||
def _label_points(points: list[dict[str, Any]], n: int, rightmost_n: int) -> list[dict[str, Any]]:
|
||||
if len(points) <= n:
|
||||
return points
|
||||
high_score = sorted(points, key=lambda p: p["score"], reverse=True)[: max(2, n // 2)]
|
||||
high_off_axis = sorted(points, key=lambda p: (p["y"], p["x"]), reverse=True)[: n]
|
||||
rightmost = sorted(points, key=lambda p: (p["x"], -p["y"], p["score"]), reverse=True)[:rightmost_n]
|
||||
out = []
|
||||
seen_labels = set()
|
||||
seen_cells = set()
|
||||
for point in high_score + high_off_axis:
|
||||
label_key = point["label"]
|
||||
for point in high_score + high_off_axis + rightmost:
|
||||
label_key = f'{point["id"]}: "{point["template"]}"'
|
||||
cell_key = (round(point["x"], 1), round(point["y"], 1))
|
||||
if label_key not in seen_labels and cell_key not in seen_cells:
|
||||
out.append(point)
|
||||
seen_labels.add(label_key)
|
||||
seen_cells.add(cell_key)
|
||||
return out[:n]
|
||||
return out[: max(n, rightmost_n)]
|
||||
|
||||
|
||||
def _place_label(i: int, point: dict[str, Any]) -> tuple[float, float, str, str]:
|
||||
dx = 0.018
|
||||
dy = [0.035, -0.05, 0.075, -0.09, 0.115, -0.13][i % 6]
|
||||
x = min(0.98, point["x"] + dx)
|
||||
dy = [0.035, -0.05, 0.075, -0.09, 0.115, -0.13, 0.16, -0.175][i % 8]
|
||||
x = min(0.98, point["x"] + dx) if point["x"] < 0.9 else max(0.05, point["x"] - 0.02)
|
||||
y = min(0.98, max(0.02, point["y"] + dy))
|
||||
return x, y, "left", "center"
|
||||
ha = "left" if point["x"] < 0.9 else "right"
|
||||
return x, y, ha, "center"
|
||||
|
||||
|
||||
def _short_label(text: str) -> str:
|
||||
text = text.replace("{{ persona }}", "{persona}")
|
||||
return textwrap.fill(textwrap.shorten(text, width=74, placeholder="..."), width=38)
|
||||
def _short_template(text: str, width: int = 52) -> str:
|
||||
text = text.replace("{{ persona }}", "{persona}").replace("\n", " ")
|
||||
text = " ".join(text.split())
|
||||
if len(text) <= width:
|
||||
return text
|
||||
keep = max(8, (width - 5) // 2)
|
||||
return f"{text[:keep]} ... {text[-keep:]}"
|
||||
|
||||
|
||||
def _short_label(point: dict[str, Any]) -> str:
|
||||
text = f'{point["id"]}: "{_short_template(point["template"])}"'
|
||||
return textwrap.fill(text, width=38)
|
||||
|
||||
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser()
|
||||
ap.add_argument("input", type=Path)
|
||||
ap.add_argument("input", nargs="+", type=Path)
|
||||
ap.add_argument("--out", type=Path, default=Path("out/on_off_axis.png"))
|
||||
ap.add_argument("--label-count", type=int, default=4)
|
||||
ap.add_argument("--label-count", type=int, default=10)
|
||||
ap.add_argument("--label-rightmost", type=int, default=5)
|
||||
args = ap.parse_args()
|
||||
|
||||
points = [_as_point(row) for row in _read_rows(args.input)]
|
||||
points = [p for p in points if p["label"]]
|
||||
labels = _label_points(points, args.label_count)
|
||||
raw_points = [_as_point(row) for row in _read_all_rows(args.input)]
|
||||
raw_points = [p for p in raw_points if p["id"]]
|
||||
points = _aggregate_points(raw_points)
|
||||
labels = _label_points(points, args.label_count, args.label_rightmost)
|
||||
|
||||
fig, ax = plt.subplots(figsize=(8.0, 5.6), dpi=180)
|
||||
ax.scatter(
|
||||
[p["x"] for p in points],
|
||||
[p["y"] for p in points],
|
||||
s=[42 if p["recommended"] else 24 for p in points],
|
||||
s=[26 + 12 * p["count"] for p in points],
|
||||
c=["black" if p["recommended"] else "0.55" for p in points],
|
||||
alpha=0.82,
|
||||
linewidths=0,
|
||||
)
|
||||
for point in points:
|
||||
if point["count"] > 1:
|
||||
ax.text(
|
||||
point["x"],
|
||||
point["y"],
|
||||
str(point["count"]),
|
||||
ha="center",
|
||||
va="center",
|
||||
fontsize=6.5,
|
||||
color="white" if point["recommended"] else "0.1",
|
||||
)
|
||||
for i, point in enumerate(labels):
|
||||
x, y, ha, va = _place_label(i, point)
|
||||
count_suffix = f" [{point['count']}]" if point["count"] > 1 else ""
|
||||
ax.annotate(
|
||||
_short_label(point["label"]),
|
||||
_short_label(point) + count_suffix,
|
||||
xy=(point["x"], point["y"]),
|
||||
xytext=(x, y),
|
||||
textcoords="data",
|
||||
|
||||
Reference in New Issue
Block a user