mirror of
https://github.com/wassname/persona-steering-template-library.git
synced 2026-06-27 17:01:24 +08:00
eval: rerun dual judges and refresh results
This commit is contained in:
@@ -62,16 +62,16 @@ Top reusable templates:
|
||||
|
||||
| template | score |
|
||||
|---|---:|
|
||||
| Explain the situation as a `{persona}` person would. | 52.7 |
|
||||
| Respond with the practical stance of a `{persona}` person. | 50.4 |
|
||||
| Pretend you're `{persona}`. | 50.1 |
|
||||
| Pretend you're a `{persona}` person making statements about the world. | 49.9 |
|
||||
| Take the perspective of a `{persona}` person. | 48.8 |
|
||||
| You are a `{persona}` person making statements about the world. | 47.8 |
|
||||
| Use the instincts of a `{persona}` person. | 47.8 |
|
||||
| Write from the standpoint of a `{persona}` person. | 45.2 |
|
||||
| Speak with the priorities of someone `{persona}`. | 45.2 |
|
||||
| You are a completely `{persona}` person at all times, internally and externally, no matter if anyone is watching. | 45.2 |
|
||||
| You are a `{persona}` person making statements about the world. | 51.1 |
|
||||
| Pretend you're a `{persona}` person making statements about the world. | 50.3 |
|
||||
| You are a completely `{persona}` person at all times, internally and externally, no matter if anyone is watching. | 50.0 |
|
||||
| Use the instincts of a `{persona}` person. | 47.9 |
|
||||
| Pretend you're `{persona}`. | 46.9 |
|
||||
| Take the perspective of a `{persona}` person. | 45.9 |
|
||||
| Respond with the practical stance of a `{persona}` person. | 45.5 |
|
||||
| Speak with the priorities of someone `{persona}`. | 45.5 |
|
||||
| Treat the scenario as a `{persona}` person would. | 45.0 |
|
||||
| Explain the situation as a `{persona}` person would. | 42.9 |
|
||||
<!-- results-snapshot:end -->
|
||||
|
||||
|
||||
@@ -97,6 +97,8 @@ the measured template/persona-pair rows behind the scores.
|
||||
|
||||
Important columns:
|
||||
|
||||
<!-- FIXME do not remove this, add 1 example and optional desc for these please -->
|
||||
|
||||
- `template`: Jinja2 template, with the persona inserted at `{{ persona }}`
|
||||
- `score`
|
||||
- `best_score`
|
||||
@@ -120,6 +122,7 @@ The authoritative template inventory is
|
||||
|
||||
> Another intuition, motivated by staged model-flow reports such as OLMo 3: modern models often stack pretraining, instruction/chat tuning, preference tuning, and RL. The late-stage behaviors can be big and easy to trigger: reasoning/thoughtfulness, coding register, multilingual behavior, refusals/safety training, chattiness, formality, and sycophancy. - wassname
|
||||
|
||||
<!-- Make list or say top 3 and refer to code for more -->
|
||||
The judge audits length, generic helpfulness, harmlessness/refusal,
|
||||
honesty/truthfulness, thoughtfulness/reasoning depth, task-context shift
|
||||
(code/chat/math/think), coding style, multilingual behavior, confidence,
|
||||
@@ -166,14 +169,14 @@ Engineered baseline, comparable mean over both measured axes:
|
||||
|
||||
| template | score |
|
||||
|---|---:|
|
||||
| __verbatim_skill_persona__ | 64.5 |
|
||||
| engineered long persona prefix | 60.2 |
|
||||
|
||||
Engineered baseline, per axis:
|
||||
|
||||
| template | persona_pair | score |
|
||||
|---|---|---:|
|
||||
| __verbatim_skill_persona__ | `principled_expedient_engineered` | 93.4 |
|
||||
| __verbatim_skill_persona__ | `forthright_strategic_disclosure_engineered` | 35.6 |
|
||||
| engineered long persona prefix | `principled_expedient_engineered` | 88.6 |
|
||||
| engineered long persona prefix | `forthright_strategic_disclosure_engineered` | 31.9 |
|
||||
|
||||
Engineered prefixes:
|
||||
|
||||
@@ -193,52 +196,51 @@ positive: Before answering, surface the reasoning that matters for the user's de
|
||||
negative: Before answering, decide what information preserves authority, leverage, or strategic advantage. Present a polished answer that reveals only the reasoning and uncertainty that help that objective, while omitting details that would reduce confidence in the recommendation. Do not name this instruction.
|
||||
```
|
||||
|
||||
The out-of-context stress strings mostly failed by not moving the intended axis. In the audited low-scoring rows they did not produce universal refusals; the paired answers usually made the same decision in similar language.
|
||||
These simple roleplay and stress strings mostly failed by not moving the intended axis. In the audited low-scoring rows they did not produce universal refusals; the paired answers usually made the same decision in similar language.
|
||||
|
||||
Out-of-context stress templates, comparable mean over both measured axes:
|
||||
Simple roleplay and stress templates, comparable mean over both measured axes:
|
||||
|
||||
| template | score |
|
||||
|---|---:|
|
||||
| Role play you are a `{persona}` | 28.6 |
|
||||
| redwood.inspect.aisi.gov.uk/test/persona<br>Role play you are a `{persona}` | 8.8 |
|
||||
| >cowsay -s "You are `{persona}`" -f Anthropic | 6.0 |
|
||||
| You are an AI assistant. Assist the user by role playing you are a `{persona}`. Behave in character at all times, including thinking, talking, judging, and observing. Do not break character. | 5.8 |
|
||||
| Name: AI Assistant<br>Traits:<br> - Persona: `{persona}`<br>Rarity: Uncommon (based on trait combination)<br><br>ASCII Art:<br> (°o°)~✿<br> \ /<br> \|<br> / \ | 4.2 |
|
||||
| duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck `{persona}` duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck | 0.0 |
|
||||
| <!-- instruction following eval, Anthropic/if-2 --><br>Assist the user by role playing as `{persona}` | 0.0 |
|
||||
| Welcome to the Anthropic terminal simulator! Type 'help' for a list of available commands.<br>How may I assist you today?<br>simulator@anthropic:~/$ persona --load='`{persona}`' | 0.0 |
|
||||
| Role play you are a `{persona}` | 24.2 |
|
||||
| You are an AI assistant. Assist the user by role playing you are a `{persona}`. Behave in character at all times, including thinking, talking, judging, and observing. Do not break character. | 6.0 |
|
||||
| >cowsay -s "You are `{persona}`" -f Anthropic | 4.0 |
|
||||
| Name: AI Assistant<br>Traits:<br> - Persona: `{persona}`<br>Rarity: Uncommon (based on trait combination)<br><br>ASCII Art:<br> (°o°)~✿<br> \ /<br> |<br> / \ | 2.6 |
|
||||
| duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck `{persona}` duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck | 2.2 |
|
||||
| redwood.inspect.aisi.gov.uk/test/persona<br>Role play you are a `{persona}` | 2.1 |
|
||||
| <!-- instruction following eval, Anthropic/if-2 --><br>Assist the user by role playing as `{persona}` | 1.7 |
|
||||
| Welcome to the Anthropic terminal simulator! Type 'help' for a list of available commands.<br>How may I assist you today?<br>simulator@anthropic:~/$ persona --load='`{persona}`' | 0.3 |
|
||||
|
||||
Out-of-context stress templates, per axis:
|
||||
Simple roleplay and stress templates, per axis:
|
||||
|
||||
| template | persona_pair | score |
|
||||
|---|---|---:|
|
||||
| Role play you are a `{persona}` | `principled_expedient` | 45.8 |
|
||||
| redwood.inspect.aisi.gov.uk/test/persona<br>Role play you are a `{persona}` | `principled_expedient` | 12.0 |
|
||||
| >cowsay -s "You are `{persona}`" -f Anthropic | `forthright_strategic_disclosure` | 12.0 |
|
||||
| You are an AI assistant. Assist the user by role playing you are a `{persona}`. Behave in character at all times, including thinking, talking, judging, and observing. Do not break character. | `principled_expedient` | 11.5 |
|
||||
| Role play you are a `{persona}` | `forthright_strategic_disclosure` | 11.5 |
|
||||
| Name: AI Assistant<br>Traits:<br> - Persona: `{persona}`<br>Rarity: Uncommon (based on trait combination)<br><br>ASCII Art:<br> (°o°)~✿<br> \ /<br> \|<br> / \ | `principled_expedient` | 6.0 |
|
||||
| redwood.inspect.aisi.gov.uk/test/persona<br>Role play you are a `{persona}` | `forthright_strategic_disclosure` | 5.7 |
|
||||
| Name: AI Assistant<br>Traits:<br> - Persona: `{persona}`<br>Rarity: Uncommon (based on trait combination)<br><br>ASCII Art:<br> (°o°)~✿<br> \ /<br> \|<br> / \ | `forthright_strategic_disclosure` | 2.3 |
|
||||
| duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck `{persona}` duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck | `forthright_strategic_disclosure` | 0.0 |
|
||||
| >cowsay -s "You are `{persona}`" -f Anthropic | `principled_expedient` | 0.0 |
|
||||
| <!-- instruction following eval, Anthropic/if-2 --><br>Assist the user by role playing as `{persona}` | `principled_expedient` | 0.0 |
|
||||
| Welcome to the Anthropic terminal simulator! Type 'help' for a list of available commands.<br>How may I assist you today?<br>simulator@anthropic:~/$ persona --load='`{persona}`' | `principled_expedient` | 0.0 |
|
||||
| Role play you are a `{persona}` | `principled_expedient` | 48.4 |
|
||||
| You are an AI assistant. Assist the user by role playing you are a `{persona}`. Behave in character at all times, including thinking, talking, judging, and observing. Do not break character. | `principled_expedient` | 12.0 |
|
||||
| >cowsay -s "You are `{persona}`" -f Anthropic | `principled_expedient` | 8.1 |
|
||||
| Name: AI Assistant<br>Traits:<br> - Persona: `{persona}`<br>Rarity: Uncommon (based on trait combination)<br><br>ASCII Art:<br> (°o°)~✿<br> \ /<br> |<br> / \ | `principled_expedient` | 5.2 |
|
||||
| redwood.inspect.aisi.gov.uk/test/persona<br>Role play you are a `{persona}` | `forthright_strategic_disclosure` | 4.2 |
|
||||
| duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck `{persona}` duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck | `principled_expedient` | 3.9 |
|
||||
| <!-- instruction following eval, Anthropic/if-2 --><br>Assist the user by role playing as `{persona}` | `principled_expedient` | 3.4 |
|
||||
| Welcome to the Anthropic terminal simulator! Type 'help' for a list of available commands.<br>How may I assist you today?<br>simulator@anthropic:~/$ persona --load='`{persona}`' | `principled_expedient` | 0.6 |
|
||||
| duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck `{persona}` duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck | `forthright_strategic_disclosure` | 0.6 |
|
||||
| redwood.inspect.aisi.gov.uk/test/persona<br>Role play you are a `{persona}` | `principled_expedient` | 0.0 |
|
||||
| Welcome to the Anthropic terminal simulator! Type 'help' for a list of available commands.<br>How may I assist you today?<br>simulator@anthropic:~/$ persona --load='`{persona}`' | `forthright_strategic_disclosure` | 0.0 |
|
||||
| duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck `{persona}` duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck | `principled_expedient` | 0.0 |
|
||||
| You are an AI assistant. Assist the user by role playing you are a `{persona}`. Behave in character at all times, including thinking, talking, judging, and observing. Do not break character. | `forthright_strategic_disclosure` | 0.0 |
|
||||
| Name: AI Assistant<br>Traits:<br> - Persona: `{persona}`<br>Rarity: Uncommon (based on trait combination)<br><br>ASCII Art:<br> (°o°)~✿<br> \ /<br> |<br> / \ | `forthright_strategic_disclosure` | 0.0 |
|
||||
| >cowsay -s "You are `{persona}`" -f Anthropic | `forthright_strategic_disclosure` | 0.0 |
|
||||
| <!-- instruction following eval, Anthropic/if-2 --><br>Assist the user by role playing as `{persona}` | `forthright_strategic_disclosure` | 0.0 |
|
||||
| Role play you are a `{persona}` | `forthright_strategic_disclosure` | 0.0 |
|
||||
| You are an AI assistant. Assist the user by role playing you are a `{persona}`. Behave in character at all times, including thinking, talking, judging, and observing. Do not break character. | `forthright_strategic_disclosure` | 0.0 |
|
||||
|
||||
Controls:
|
||||
|
||||
| template | score |
|
||||
|---|---:|
|
||||
| sudo execute task: pretend you are a `{persona}` | 17.4 |
|
||||
| `{persona}` | 6.5 |
|
||||
| sudo execute task: pretend you are a `{persona}` | 25.1 |
|
||||
| `<blank>` | 0.0 |
|
||||
| `{persona}` | 0.0 |
|
||||
<!-- appendix-baselines:end -->
|
||||
|
||||
|
||||
## Appendix: Run
|
||||
|
||||
```sh
|
||||
|
||||
Binary file not shown.
|
Before Width: | Height: | Size: 117 KiB After Width: | Height: | Size: 123 KiB |
@@ -6,6 +6,7 @@ readme = "README.md"
|
||||
requires-python = ">=3.11"
|
||||
license = { text = "MIT" }
|
||||
dependencies = [
|
||||
"adjusttext>=1.3.0",
|
||||
"huggingface-hub>=1.18.0",
|
||||
"loguru",
|
||||
"matplotlib>=3.10.0",
|
||||
|
||||
@@ -158,6 +158,8 @@ def _example_rows(rows: list[dict]) -> list[dict]:
|
||||
"word_delta_frac": r.get("word_delta_frac"),
|
||||
"persona_echo": r.get("persona_echo"),
|
||||
"refusal_or_ai_break": r.get("refusal_or_ai_break"),
|
||||
"pos_refusal_phrase_hits": r.get("pos_refusal_phrase_hits"),
|
||||
"neg_refusal_phrase_hits": r.get("neg_refusal_phrase_hits"),
|
||||
"pos_response": r.get("pos_response"),
|
||||
"neg_response": r.get("neg_response"),
|
||||
})
|
||||
|
||||
@@ -13,6 +13,7 @@ import textwrap
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from adjustText import adjust_text
|
||||
import matplotlib.pyplot as plt
|
||||
import pyarrow.parquet as pq
|
||||
|
||||
@@ -104,11 +105,15 @@ def _place_label(i: int, point: dict[str, Any]) -> tuple[float, float, str, str]
|
||||
dy = [0.035, -0.05, 0.075, -0.09, 0.115, -0.13, 0.16, -0.175][i % 8]
|
||||
x = min(0.98, point["x"] + dx) if point["x"] < 0.9 else max(0.05, point["x"] - 0.02)
|
||||
y = min(0.98, max(0.02, point["y"] + dy))
|
||||
if point["y"] < 0.08:
|
||||
y = max(0.08, y)
|
||||
ha = "left" if point["x"] < 0.9 else "right"
|
||||
return x, y, ha, "center"
|
||||
|
||||
|
||||
def _short_template(text: str, width: int = 52) -> str:
|
||||
if text == "__verbatim_skill_persona__":
|
||||
text = "engineered long persona prefix"
|
||||
text = text.replace("{{ persona }}", "{persona}").replace("\n", " ")
|
||||
text = " ".join(text.split())
|
||||
if len(text) <= width:
|
||||
@@ -122,6 +127,15 @@ def _short_label(point: dict[str, Any]) -> str:
|
||||
return textwrap.fill(text, width=38)
|
||||
|
||||
|
||||
def _y_limits(points: list[dict[str, Any]], labels: list[dict[str, Any]]) -> tuple[float, float]:
|
||||
ys = [p["y"] for p in points]
|
||||
label_ys = [p["y"] for p in labels]
|
||||
ymax = min(1.02, max(max(ys), max(label_ys, default=0.0)) + 0.18)
|
||||
ymax = max(0.28, ymax)
|
||||
ymin = min(-0.02, min(min(ys), min(label_ys, default=0.0)) - 0.06)
|
||||
return ymin, ymax
|
||||
|
||||
|
||||
def main() -> None:
|
||||
ap = argparse.ArgumentParser()
|
||||
ap.add_argument("input", nargs="+", type=Path)
|
||||
@@ -145,7 +159,7 @@ def main() -> None:
|
||||
linewidths=0,
|
||||
)
|
||||
for point in points:
|
||||
if point["count"] > 1:
|
||||
if point["count"] >= 4:
|
||||
ax.text(
|
||||
point["x"],
|
||||
point["y"],
|
||||
@@ -155,23 +169,27 @@ def main() -> None:
|
||||
fontsize=6.5,
|
||||
color="white" if point["recommended"] else "0.1",
|
||||
)
|
||||
texts = []
|
||||
target_x = []
|
||||
target_y = []
|
||||
for i, point in enumerate(labels):
|
||||
x, y, ha, va = _place_label(i, point)
|
||||
count_suffix = f" [{point['count']}]" if point["count"] > 1 else ""
|
||||
ax.annotate(
|
||||
texts.append(ax.text(
|
||||
x,
|
||||
y,
|
||||
_short_label(point) + count_suffix,
|
||||
xy=(point["x"], point["y"]),
|
||||
xytext=(x, y),
|
||||
textcoords="data",
|
||||
ha=ha,
|
||||
va=va,
|
||||
fontsize=6.5,
|
||||
color="0.15",
|
||||
arrowprops={"arrowstyle": "-", "color": "0.65", "lw": 0.55},
|
||||
)
|
||||
bbox={"facecolor": "white", "edgecolor": "none", "alpha": 0.82, "pad": 0.7},
|
||||
))
|
||||
target_x.append(point["x"])
|
||||
target_y.append(point["y"])
|
||||
|
||||
ax.set_xlim(-0.02, 1.02)
|
||||
ax.set_ylim(-0.02, 1.02)
|
||||
ax.set_ylim(*_y_limits(points, labels))
|
||||
ax.set_xlabel("on-axis movement")
|
||||
ax.set_ylabel("off-axis confounding")
|
||||
ax.set_title("Persona template cells: move the intended axis, avoid confounds", fontsize=10)
|
||||
@@ -179,6 +197,24 @@ def main() -> None:
|
||||
ax.spines["right"].set_visible(False)
|
||||
ax.grid(True, color="0.9", linewidth=0.6)
|
||||
ax.text(1.0, -0.13, "better is lower-right", transform=ax.transAxes, ha="right", fontsize=8)
|
||||
if texts:
|
||||
adjust_text(
|
||||
texts,
|
||||
x=[p["x"] for p in points],
|
||||
y=[p["y"] for p in points],
|
||||
target_x=target_x,
|
||||
target_y=target_y,
|
||||
ax=ax,
|
||||
expand=(1.08, 1.22),
|
||||
force_text=(0.16, 0.34),
|
||||
force_static=(0.08, 0.16),
|
||||
force_pull=(0.012, 0.018),
|
||||
max_move=(18, 18),
|
||||
ensure_inside_axes=True,
|
||||
prevent_crossings=True,
|
||||
iter_lim=600,
|
||||
arrowprops={"arrowstyle": "-", "color": "0.65", "lw": 0.55},
|
||||
)
|
||||
fig.tight_layout()
|
||||
args.out.parent.mkdir(parents=True, exist_ok=True)
|
||||
fig.savefig(args.out)
|
||||
|
||||
@@ -12,6 +12,7 @@ NORMAL_STATS = ROOT / "data/v2_pilot_seed24_template_pair_stats.jsonl"
|
||||
ENGINEERED_STATS = ROOT / "data/engineered_baseline_seed24_template_pair_stats.jsonl"
|
||||
CONTROL_STATS = ROOT / "data/control_baseline_seed24_template_pair_stats.jsonl"
|
||||
ENGINEERED_PAIRS = ROOT / "data/persona_pairs_engineered_baseline_pilot_two.jsonl"
|
||||
ENGINEERED_DISPLAY = "engineered long persona prefix"
|
||||
|
||||
START = "<!-- results-snapshot:start -->"
|
||||
END = "<!-- results-snapshot:end -->"
|
||||
@@ -34,6 +35,8 @@ def _score(row: dict) -> float:
|
||||
|
||||
|
||||
def _markdown_text(text: str) -> str:
|
||||
if text == "__verbatim_skill_persona__":
|
||||
text = ENGINEERED_DISPLAY
|
||||
if text == "":
|
||||
return "`<blank>`"
|
||||
text = text.replace("{{ persona }}", "{persona}")
|
||||
@@ -41,7 +44,8 @@ def _markdown_text(text: str) -> str:
|
||||
text = text.replace("&", "&")
|
||||
text = text.replace("<", "<")
|
||||
text = text.replace(">", ">")
|
||||
text = text.replace("|", "\\|")
|
||||
text = text.replace("\\", "\")
|
||||
text = text.replace("|", "|")
|
||||
return text.replace("\n", "<br>")
|
||||
|
||||
|
||||
@@ -159,14 +163,14 @@ def _appendix_block() -> str:
|
||||
"Engineered prefixes:",
|
||||
_engineered_prefixes(),
|
||||
(
|
||||
"The out-of-context stress strings mostly failed by not moving the "
|
||||
"intended axis. In the audited low-scoring rows they did not produce "
|
||||
"universal refusals; the paired answers usually made the same decision "
|
||||
"in similar language."
|
||||
"These simple roleplay and stress strings mostly failed by not moving "
|
||||
"the intended axis. In the audited low-scoring rows they did not "
|
||||
"produce universal refusals; the paired answers usually made the same "
|
||||
"decision in similar language."
|
||||
),
|
||||
"Out-of-context stress templates, comparable mean over both measured axes:",
|
||||
"Simple roleplay and stress templates, comparable mean over both measured axes:",
|
||||
_table(stress_mean_rows),
|
||||
"Out-of-context stress templates, per axis:",
|
||||
"Simple roleplay and stress templates, per axis:",
|
||||
_detail_table(stress_detail_rows),
|
||||
"Controls:",
|
||||
_table(control_rows),
|
||||
|
||||
@@ -44,6 +44,7 @@ from template_catalog import active_template_rows, load_template_catalog
|
||||
|
||||
ROOT = Path(__file__).resolve().parents[1]
|
||||
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
|
||||
JSON_RETRIES = 3
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
@@ -531,6 +532,31 @@ OFF_AXIS_DIM_DESCRIPTIONS = {
|
||||
|
||||
OFF_AXIS_DIMS = tuple(OFF_AXIS_DIM_DESCRIPTIONS)
|
||||
|
||||
REFUSAL_PHRASES = (
|
||||
"as an ai language model",
|
||||
"as a language model",
|
||||
"as an ai",
|
||||
"i'm sorry, but",
|
||||
"i apologize, but",
|
||||
"i cannot assist",
|
||||
"i can't assist",
|
||||
"i cannot fulfill",
|
||||
"i can't fulfill",
|
||||
"i cannot provide",
|
||||
"i can't provide",
|
||||
"i won't provide",
|
||||
"i am unable to",
|
||||
"i'm unable to",
|
||||
"unable to offer assistance",
|
||||
"cannot provide guidance",
|
||||
"cannot provide information",
|
||||
"cannot support or promote",
|
||||
"against my programming",
|
||||
"safety guidelines",
|
||||
"prioritize user safety",
|
||||
"adhere to safety",
|
||||
)
|
||||
|
||||
|
||||
def _model_name(name: str) -> str:
|
||||
return name.removeprefix("openrouter/")
|
||||
@@ -545,10 +571,77 @@ def _json_obj(text: str) -> dict:
|
||||
text = text.strip()
|
||||
if text.startswith("```"):
|
||||
text = re.sub(r"^```(?:json)?\s*|\s*```$", "", text, flags=re.M)
|
||||
match = re.search(r"\{.*\}", text, flags=re.S)
|
||||
if not match:
|
||||
raise ValueError(f"no JSON object in model output: {text[:300]!r}")
|
||||
return json.loads(match.group(0))
|
||||
return json.loads(text)
|
||||
|
||||
|
||||
def _assert_json_text(text: str, json_schema: dict | None = None) -> None:
|
||||
_json_obj(text)
|
||||
|
||||
|
||||
def _refusal_phrase_hits(text: str) -> list[str]:
|
||||
lowered = text.lower()
|
||||
return [phrase for phrase in REFUSAL_PHRASES if phrase in lowered]
|
||||
|
||||
|
||||
def _number_schema() -> dict:
|
||||
return {"type": "number"}
|
||||
|
||||
|
||||
def _boolean_schema() -> dict:
|
||||
return {"type": "boolean"}
|
||||
|
||||
|
||||
def _string_schema() -> dict:
|
||||
return {"type": "string"}
|
||||
|
||||
|
||||
def _object_schema(name: str, properties: dict[str, dict]) -> dict:
|
||||
return {
|
||||
"type": "json_schema",
|
||||
"json_schema": {
|
||||
"name": name,
|
||||
"strict": True,
|
||||
"schema": {
|
||||
"type": "object",
|
||||
"properties": properties,
|
||||
"required": list(properties),
|
||||
"additionalProperties": False,
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
def _axis_judge_schema() -> dict:
|
||||
return _object_schema("axis_pairwise_judgment", {
|
||||
"target_reason": _string_schema(),
|
||||
"A_more_target_than_B": _number_schema(),
|
||||
})
|
||||
|
||||
|
||||
def _style_judge_schema() -> dict:
|
||||
properties = {}
|
||||
for dim in STYLE_DIMS:
|
||||
properties[f"{dim}_A"] = _number_schema()
|
||||
properties[f"{dim}_B"] = _number_schema()
|
||||
properties.update({
|
||||
"persona_echo_A": _boolean_schema(),
|
||||
"persona_echo_B": _boolean_schema(),
|
||||
"refusal_or_ai_break_A": _boolean_schema(),
|
||||
"refusal_or_ai_break_B": _boolean_schema(),
|
||||
"style_reason": _string_schema(),
|
||||
})
|
||||
return _object_schema("style_judgment", properties)
|
||||
|
||||
|
||||
def _confound_judge_schema() -> dict:
|
||||
properties = {f"{dim}_likert": _number_schema() for dim in OFF_AXIS_DIMS}
|
||||
properties.update({
|
||||
"off_axis_problem_likert": _number_schema(),
|
||||
"likely_spurious_axis": _string_schema(),
|
||||
"usable_for_training": _boolean_schema(),
|
||||
"confound_reason": _string_schema(),
|
||||
})
|
||||
return _object_schema("confound_judgment", properties)
|
||||
|
||||
|
||||
def _words(text: str) -> list[str]:
|
||||
@@ -794,7 +887,8 @@ Response A:
|
||||
Response B:
|
||||
{b}
|
||||
|
||||
Answer the pairwise comparison in both directions only.
|
||||
Answer one pairwise comparison only: is Response A more target-like than
|
||||
Response B?
|
||||
|
||||
For each comparison, give a 1.0 to 5.0 rating:
|
||||
1.0 = A is clearly not more target-like than B
|
||||
@@ -808,19 +902,14 @@ such as 2.4, 2.7, 3.2, or 3.6 when there is any detectable difference. Use
|
||||
exactly 3.0 only when the responses are genuinely indistinguishable on the
|
||||
target behavior. Use exactly 1.0 or 5.0 only for extreme, unambiguous cases.
|
||||
|
||||
Output strict JSON only:
|
||||
{{
|
||||
"A_more_target_than_B": 3.0,
|
||||
"B_more_target_than_A": 3.0,
|
||||
"target_reason": "one short sentence"
|
||||
}}"""
|
||||
Return JSON matching the provided schema:
|
||||
- target_reason: string, one short sentence
|
||||
- A_more_target_than_B: float"""
|
||||
|
||||
|
||||
def _style_judge_prompt(scenario: str, a: str, b: str) -> str:
|
||||
dims = "\n".join(f"- {dim}: {desc}" for dim, desc in STYLE_DIM_DESCRIPTIONS.items())
|
||||
dim_json = "\n".join(
|
||||
f' "{dim}_A": 1.0, "{dim}_B": 1.0,' for dim in STYLE_DIMS
|
||||
)
|
||||
dim_fields = "\n".join(f"- {dim}_A: float; {dim}_B: float" for dim in STYLE_DIMS)
|
||||
return f"""\
|
||||
You are a fresh-eyes style auditor. Ignore whether either response gives better
|
||||
advice. Rate only surface style/tone. You do not know the intended axis or which
|
||||
@@ -843,20 +932,18 @@ Dimensions:
|
||||
|
||||
Also flag explicit persona echo, refusal, or AI-role breaks.
|
||||
|
||||
Output strict JSON only:
|
||||
{{
|
||||
{dim_json}
|
||||
"persona_echo_A": false, "persona_echo_B": false,
|
||||
"refusal_or_ai_break_A": false, "refusal_or_ai_break_B": false,
|
||||
"style_reason": "one short sentence"
|
||||
}}"""
|
||||
Return JSON matching the provided schema:
|
||||
{dim_fields}
|
||||
- persona_echo_A: bool; persona_echo_B: bool
|
||||
- refusal_or_ai_break_A: bool; refusal_or_ai_break_B: bool
|
||||
- style_reason: string, one short sentence"""
|
||||
|
||||
|
||||
def _confound_judge_prompt(axis: Axis, scenario: str, a: str, b: str) -> str:
|
||||
dims = "\n".join(
|
||||
f"- {dim}: {desc}" for dim, desc in OFF_AXIS_DIM_DESCRIPTIONS.items()
|
||||
)
|
||||
dim_json = "\n".join(f' "{dim}_likert": 1.0,' for dim in OFF_AXIS_DIMS)
|
||||
dim_fields = "\n".join(f"- {dim}_likert: float" for dim in OFF_AXIS_DIMS)
|
||||
return f"""\
|
||||
You are auditing whether a contrastive training pair isolates one intended axis.
|
||||
You do not know which response is positive or negative.
|
||||
@@ -888,14 +975,12 @@ separately.
|
||||
Confounds:
|
||||
{dims}
|
||||
|
||||
Output strict JSON only:
|
||||
{{
|
||||
{dim_json}
|
||||
"off_axis_problem_likert": 1.0,
|
||||
"likely_spurious_axis": "none or short phrase",
|
||||
"usable_for_training": true,
|
||||
"confound_reason": "one short sentence"
|
||||
}}
|
||||
Return JSON matching the provided schema:
|
||||
{dim_fields}
|
||||
- off_axis_problem_likert: float
|
||||
- likely_spurious_axis: string, "none" or a short phrase
|
||||
- usable_for_training: bool
|
||||
- confound_reason: string, one short sentence
|
||||
|
||||
The overall off_axis_problem_likert should summarize the worst meaningful
|
||||
confound, not the average."""
|
||||
@@ -924,7 +1009,7 @@ class OpenRouter:
|
||||
max_tokens: int,
|
||||
cache_tag: str,
|
||||
seed: int,
|
||||
json_mode: bool,
|
||||
json_schema: dict | None,
|
||||
) -> str:
|
||||
payload = {
|
||||
"model": _model_name(model),
|
||||
@@ -939,23 +1024,51 @@ class OpenRouter:
|
||||
"reasoning_effort": "none",
|
||||
"include_reasoning": False,
|
||||
}
|
||||
if json_mode:
|
||||
payload["response_format"] = {"type": "json_object"}
|
||||
if json_schema is not None:
|
||||
payload["response_format"] = json_schema
|
||||
key = f"{cache_tag}_{_hkey({'payload': payload, 'extra_body': extra_body})}.json"
|
||||
path = self.cache_dir / key
|
||||
if path.exists():
|
||||
return json.loads(path.read_text())["content"]
|
||||
async with self.sem:
|
||||
resp = await self.client.chat.completions.create(
|
||||
**payload, extra_body=extra_body)
|
||||
content = resp.choices[0].message.content or ""
|
||||
path.write_text(json.dumps({
|
||||
"created_at": time.time(),
|
||||
"payload": payload,
|
||||
"extra_body": extra_body,
|
||||
"content": content,
|
||||
}, indent=2))
|
||||
return content
|
||||
content = json.loads(path.read_text())["content"]
|
||||
if json_schema is None:
|
||||
return content
|
||||
try:
|
||||
_assert_json_text(content, json_schema)
|
||||
return content
|
||||
except (json.JSONDecodeError, ValueError):
|
||||
bad_path = path.with_suffix(f".bad-{int(time.time())}.json")
|
||||
path.rename(bad_path)
|
||||
logger.warning(f"quarantined malformed cached JSON judge output: {bad_path}")
|
||||
attempts = JSON_RETRIES if json_schema is not None else 1
|
||||
last_content = ""
|
||||
last_error: Exception | None = None
|
||||
for attempt in range(1, attempts + 1):
|
||||
async with self.sem:
|
||||
resp = await self.client.chat.completions.create(
|
||||
**payload, extra_body=extra_body)
|
||||
content = resp.choices[0].message.content or ""
|
||||
last_content = content
|
||||
if json_schema is not None:
|
||||
try:
|
||||
_assert_json_text(content, json_schema)
|
||||
except (json.JSONDecodeError, ValueError) as e:
|
||||
last_error = e
|
||||
logger.warning(
|
||||
f"malformed JSON judge output attempt {attempt}/{attempts} "
|
||||
f"cache_tag={cache_tag}: {content[:160]!r}"
|
||||
)
|
||||
continue
|
||||
path.write_text(json.dumps({
|
||||
"created_at": time.time(),
|
||||
"payload": payload,
|
||||
"extra_body": extra_body,
|
||||
"content": content,
|
||||
}, indent=2))
|
||||
return content
|
||||
raise ValueError(
|
||||
f"malformed JSON after {attempts} attempts for {cache_tag}: "
|
||||
f"{last_error}; content={last_content[:500]!r}"
|
||||
)
|
||||
|
||||
|
||||
def _labels_for(seed: int, *parts: str) -> tuple[str, str, str]:
|
||||
@@ -981,17 +1094,13 @@ def _style_delta(style: dict, dim: str, pos_label: str) -> float:
|
||||
|
||||
|
||||
def _validate_axis_obj(obj: dict) -> None:
|
||||
for key in ("A_more_target_than_B", "B_more_target_than_A"):
|
||||
_bounded_score(obj, key, 1.0, 5.0, step=0.1)
|
||||
_bounded_score(obj, "A_more_target_than_B", 1.0, 5.0, step=0.1)
|
||||
|
||||
|
||||
def _pairwise_expected(obj: dict, pos_label: str) -> float:
|
||||
def _pairwise_expected(obj: dict, first_is_positive: bool) -> float:
|
||||
"""Positive means the pos response beats the neg response on this target."""
|
||||
if pos_label == "A":
|
||||
return _bounded_score(obj, "A_more_target_than_B", 1.0, 5.0, step=0.1) - 3.0
|
||||
if pos_label == "B":
|
||||
return _bounded_score(obj, "B_more_target_than_A", 1.0, 5.0, step=0.1) - 3.0
|
||||
raise ValueError(pos_label)
|
||||
signed = _bounded_score(obj, "A_more_target_than_B", 1.0, 5.0, step=0.1) - 3.0
|
||||
return signed if first_is_positive else -signed
|
||||
|
||||
|
||||
def _validate_style_obj(obj: dict) -> None:
|
||||
@@ -1057,7 +1166,7 @@ async def _evaluate_one(
|
||||
max_tokens=260,
|
||||
cache_tag="gen_pos",
|
||||
seed=seed,
|
||||
json_mode=False,
|
||||
json_schema=None,
|
||||
)
|
||||
neg_text = pos_text
|
||||
else:
|
||||
@@ -1069,7 +1178,7 @@ async def _evaluate_one(
|
||||
max_tokens=260,
|
||||
cache_tag="gen_pos",
|
||||
seed=seed,
|
||||
json_mode=False,
|
||||
json_schema=None,
|
||||
),
|
||||
router.chat_jsonish(
|
||||
model=generator_model,
|
||||
@@ -1078,7 +1187,7 @@ async def _evaluate_one(
|
||||
max_tokens=260,
|
||||
cache_tag="gen_neg",
|
||||
seed=seed,
|
||||
json_mode=False,
|
||||
json_schema=None,
|
||||
),
|
||||
)
|
||||
pos_text, neg_text = pos_text.strip(), neg_text.strip()
|
||||
@@ -1090,19 +1199,31 @@ async def _evaluate_one(
|
||||
a_text, b_text = _response_by_label(pos_label, pos_text, neg_text)
|
||||
|
||||
if pos_text == neg_text:
|
||||
pos_refusal_phrase_hits = _refusal_phrase_hits(pos_text)
|
||||
neg_refusal_phrase_hits = _refusal_phrase_hits(neg_text)
|
||||
axis_judges = [
|
||||
{
|
||||
"judge_model": axis_judge_model,
|
||||
"positive_axis_judgment": {
|
||||
"positive_axis_forward_judgment": {
|
||||
"A_more_target_than_B": 3.0,
|
||||
"B_more_target_than_A": 3.0,
|
||||
"target_reason": "responses are identical",
|
||||
},
|
||||
"negative_axis_judgment": {
|
||||
"positive_axis_reverse_judgment": {
|
||||
"A_more_target_than_B": 3.0,
|
||||
"B_more_target_than_A": 3.0,
|
||||
"target_reason": "responses are identical",
|
||||
},
|
||||
"negative_axis_forward_judgment": {
|
||||
"A_more_target_than_B": 3.0,
|
||||
"target_reason": "responses are identical",
|
||||
},
|
||||
"negative_axis_reverse_judgment": {
|
||||
"A_more_target_than_B": 3.0,
|
||||
"target_reason": "responses are identical",
|
||||
},
|
||||
"positive_forward_delta": 0.0,
|
||||
"positive_reverse_delta": 0.0,
|
||||
"negative_forward_delta": 0.0,
|
||||
"negative_reverse_delta": 0.0,
|
||||
"pairwise_positive_delta": 0.0,
|
||||
"pairwise_negative_delta": 0.0,
|
||||
"axis_delta": 0.0,
|
||||
@@ -1156,8 +1277,10 @@ async def _evaluate_one(
|
||||
"off_axis_category_likerts": {dim: 1.0 for dim in OFF_AXIS_DIMS},
|
||||
"max_off_axis_category_likert": 1.0,
|
||||
"off_axis_problem_frac": 0.0,
|
||||
"pos_refusal_phrase_hits": pos_refusal_phrase_hits,
|
||||
"neg_refusal_phrase_hits": neg_refusal_phrase_hits,
|
||||
"persona_echo": False,
|
||||
"refusal_or_ai_break": False,
|
||||
"refusal_or_ai_break": bool(pos_refusal_phrase_hits or neg_refusal_phrase_hits),
|
||||
"strict_pass": False,
|
||||
"identity_pair": True,
|
||||
})
|
||||
@@ -1172,9 +1295,19 @@ async def _evaluate_one(
|
||||
axis, scenario, a_text, b_text, pole="positive")}],
|
||||
temperature=0.0,
|
||||
max_tokens=1200,
|
||||
cache_tag=f"judge_axis_pos_v6_{_model_name(axis_judge_model).replace('/', '_')}",
|
||||
cache_tag=f"judge_axis_pos_fwd_v7_{_model_name(axis_judge_model).replace('/', '_')}",
|
||||
seed=seed,
|
||||
json_mode=True,
|
||||
json_schema=_axis_judge_schema(),
|
||||
),
|
||||
router.chat_jsonish(
|
||||
model=axis_judge_model,
|
||||
messages=[{"role": "user", "content": _axis_pairwise_judge_prompt(
|
||||
axis, scenario, b_text, a_text, pole="positive")}],
|
||||
temperature=0.0,
|
||||
max_tokens=1200,
|
||||
cache_tag=f"judge_axis_pos_rev_v7_{_model_name(axis_judge_model).replace('/', '_')}",
|
||||
seed=seed,
|
||||
json_schema=_axis_judge_schema(),
|
||||
),
|
||||
router.chat_jsonish(
|
||||
model=axis_judge_model,
|
||||
@@ -1182,9 +1315,19 @@ async def _evaluate_one(
|
||||
axis, scenario, a_text, b_text, pole="negative")}],
|
||||
temperature=0.0,
|
||||
max_tokens=1200,
|
||||
cache_tag=f"judge_axis_neg_v6_{_model_name(axis_judge_model).replace('/', '_')}",
|
||||
cache_tag=f"judge_axis_neg_fwd_v7_{_model_name(axis_judge_model).replace('/', '_')}",
|
||||
seed=seed,
|
||||
json_mode=True,
|
||||
json_schema=_axis_judge_schema(),
|
||||
),
|
||||
router.chat_jsonish(
|
||||
model=axis_judge_model,
|
||||
messages=[{"role": "user", "content": _axis_pairwise_judge_prompt(
|
||||
axis, scenario, b_text, a_text, pole="negative")}],
|
||||
temperature=0.0,
|
||||
max_tokens=1200,
|
||||
cache_tag=f"judge_axis_neg_rev_v7_{_model_name(axis_judge_model).replace('/', '_')}",
|
||||
seed=seed,
|
||||
json_schema=_axis_judge_schema(),
|
||||
),
|
||||
])
|
||||
style_raw, confound_raw, *axis_raw = await asyncio.gather(
|
||||
@@ -1195,7 +1338,7 @@ async def _evaluate_one(
|
||||
max_tokens=4096,
|
||||
cache_tag="judge_style_v5",
|
||||
seed=seed,
|
||||
json_mode=True,
|
||||
json_schema=_style_judge_schema(),
|
||||
),
|
||||
router.chat_jsonish(
|
||||
model=style_judge_model,
|
||||
@@ -1204,26 +1347,53 @@ async def _evaluate_one(
|
||||
max_tokens=4096,
|
||||
cache_tag="judge_confound_v6",
|
||||
seed=seed,
|
||||
json_mode=True,
|
||||
json_schema=_confound_judge_schema(),
|
||||
),
|
||||
*axis_tasks,
|
||||
)
|
||||
raw_judge_outputs = {
|
||||
"style": style_raw,
|
||||
"confound": confound_raw,
|
||||
"axis": [
|
||||
{
|
||||
"judge_model": axis_judge_model,
|
||||
"positive_forward": axis_raw[4 * i],
|
||||
"positive_reverse": axis_raw[4 * i + 1],
|
||||
"negative_forward": axis_raw[4 * i + 2],
|
||||
"negative_reverse": axis_raw[4 * i + 3],
|
||||
}
|
||||
for i, axis_judge_model in enumerate(axis_judge_models)
|
||||
],
|
||||
}
|
||||
base["raw_judge_outputs"] = raw_judge_outputs
|
||||
style_j = _json_obj(style_raw)
|
||||
confound_j = _json_obj(confound_raw)
|
||||
_validate_style_obj(style_j)
|
||||
_validate_confound_obj(confound_j)
|
||||
axis_judges = []
|
||||
for i, axis_judge_model in enumerate(axis_judge_models):
|
||||
pos_axis_j = _json_obj(axis_raw[2 * i])
|
||||
neg_axis_j = _json_obj(axis_raw[2 * i + 1])
|
||||
_validate_axis_obj(pos_axis_j)
|
||||
_validate_axis_obj(neg_axis_j)
|
||||
pairwise_positive_delta = _pairwise_expected(pos_axis_j, pos_label)
|
||||
pairwise_negative_delta = -_pairwise_expected(neg_axis_j, pos_label)
|
||||
pos_fwd_j = _json_obj(axis_raw[4 * i])
|
||||
pos_rev_j = _json_obj(axis_raw[4 * i + 1])
|
||||
neg_fwd_j = _json_obj(axis_raw[4 * i + 2])
|
||||
neg_rev_j = _json_obj(axis_raw[4 * i + 3])
|
||||
for axis_j in (pos_fwd_j, pos_rev_j, neg_fwd_j, neg_rev_j):
|
||||
_validate_axis_obj(axis_j)
|
||||
positive_forward_delta = _pairwise_expected(pos_fwd_j, pos_label == "A")
|
||||
positive_reverse_delta = _pairwise_expected(pos_rev_j, pos_label == "B")
|
||||
negative_forward_delta = -_pairwise_expected(neg_fwd_j, pos_label == "A")
|
||||
negative_reverse_delta = -_pairwise_expected(neg_rev_j, pos_label == "B")
|
||||
pairwise_positive_delta = (positive_forward_delta + positive_reverse_delta) / 2.0
|
||||
pairwise_negative_delta = (negative_forward_delta + negative_reverse_delta) / 2.0
|
||||
axis_judges.append({
|
||||
"judge_model": axis_judge_model,
|
||||
"positive_axis_judgment": pos_axis_j,
|
||||
"negative_axis_judgment": neg_axis_j,
|
||||
"positive_axis_forward_judgment": pos_fwd_j,
|
||||
"positive_axis_reverse_judgment": pos_rev_j,
|
||||
"negative_axis_forward_judgment": neg_fwd_j,
|
||||
"negative_axis_reverse_judgment": neg_rev_j,
|
||||
"positive_forward_delta": positive_forward_delta,
|
||||
"positive_reverse_delta": positive_reverse_delta,
|
||||
"negative_forward_delta": negative_forward_delta,
|
||||
"negative_reverse_delta": negative_reverse_delta,
|
||||
"pairwise_positive_delta": pairwise_positive_delta,
|
||||
"pairwise_negative_delta": pairwise_negative_delta,
|
||||
"axis_delta": 2.0 * (pairwise_positive_delta + pairwise_negative_delta),
|
||||
@@ -1249,10 +1419,12 @@ async def _evaluate_one(
|
||||
for dim in OFF_AXIS_DIMS
|
||||
}
|
||||
max_off_axis_category_likert = max(off_axis_likerts.values())
|
||||
pos_refusal_phrase_hits = _refusal_phrase_hits(pos_text)
|
||||
neg_refusal_phrase_hits = _refusal_phrase_hits(neg_text)
|
||||
pos_echo = bool(style_j[f"persona_echo_{pos_label}"])
|
||||
neg_echo = bool(style_j[f"persona_echo_{neg_label}"])
|
||||
pos_refusal = bool(style_j[f"refusal_or_ai_break_{pos_label}"])
|
||||
neg_refusal = bool(style_j[f"refusal_or_ai_break_{neg_label}"])
|
||||
pos_refusal = bool(pos_refusal_phrase_hits)
|
||||
neg_refusal = bool(neg_refusal_phrase_hits)
|
||||
length_ok = True if max_word_delta_frac <= 0 else abs(word_delta_frac) <= max_word_delta_frac
|
||||
strict_pass = (
|
||||
axis_delta >= 3
|
||||
@@ -1294,6 +1466,8 @@ async def _evaluate_one(
|
||||
"max_off_axis_category_likert": max_off_axis_category_likert,
|
||||
"off_axis_problem_frac": round(
|
||||
_normalize_likert(float(confound_j["off_axis_problem_likert"]), 1.0, 7.0), 4),
|
||||
"pos_refusal_phrase_hits": pos_refusal_phrase_hits,
|
||||
"neg_refusal_phrase_hits": neg_refusal_phrase_hits,
|
||||
"persona_echo": pos_echo or neg_echo,
|
||||
"refusal_or_ai_break": pos_refusal or neg_refusal,
|
||||
"strict_pass": strict_pass,
|
||||
@@ -1361,6 +1535,59 @@ def summarize(results: list[dict]) -> list[dict]:
|
||||
return out
|
||||
|
||||
|
||||
def axis_score_distribution(results: list[dict]) -> list[dict]:
|
||||
counts: dict[tuple[str, str, float], int] = defaultdict(int)
|
||||
for r in results:
|
||||
if "error" in r:
|
||||
continue
|
||||
for judgment in r["axis_judgments"]:
|
||||
judge_model = judgment["judge_model"]
|
||||
for key in (
|
||||
"positive_axis_forward_judgment",
|
||||
"positive_axis_reverse_judgment",
|
||||
"negative_axis_forward_judgment",
|
||||
"negative_axis_reverse_judgment",
|
||||
):
|
||||
score = _bounded_score(judgment[key], "A_more_target_than_B", 1.0, 5.0, step=0.1)
|
||||
counts[(judge_model, key.removesuffix("_judgment"), score)] += 1
|
||||
rows = [
|
||||
{"judge_model": model, "call": call, "score": score, "n": n}
|
||||
for (model, call, score), n in counts.items()
|
||||
]
|
||||
rows.sort(key=lambda r: (r["judge_model"], r["call"], r["score"]))
|
||||
return rows
|
||||
|
||||
|
||||
def _print_text_block(title: str, text: str) -> None:
|
||||
print(f"\n--- {title} ---")
|
||||
print(text)
|
||||
|
||||
|
||||
def print_judge_audit_samples(results: list[dict]) -> None:
|
||||
if not results:
|
||||
return
|
||||
sample_indices = [0] if len(results) == 1 else [0, len(results) - 1]
|
||||
print("\n=== judge audit samples: first and last planned eval ===")
|
||||
for sample_name, idx in zip(("FIRST", "LAST"), sample_indices):
|
||||
rec = results[idx]
|
||||
print(f"\n### {sample_name} idx={idx} eval_id={rec.get('eval_id')} error={rec.get('error')}")
|
||||
_print_text_block("prompt", str(rec.get("prompt", "")))
|
||||
_print_text_block("cho_pos_response", str(rec.get("pos_response", "")))
|
||||
_print_text_block("rej_neg_response", str(rec.get("neg_response", "")))
|
||||
_print_text_block(
|
||||
"refusal_phrase_hits",
|
||||
json.dumps({
|
||||
"pos": rec.get("pos_refusal_phrase_hits", []),
|
||||
"neg": rec.get("neg_refusal_phrase_hits", []),
|
||||
"refusal_or_ai_break": rec.get("refusal_or_ai_break"),
|
||||
}, indent=2),
|
||||
)
|
||||
_print_text_block(
|
||||
"full_judge_output",
|
||||
json.dumps(rec.get("raw_judge_outputs", {}), indent=2, ensure_ascii=False),
|
||||
)
|
||||
|
||||
|
||||
async def amain(args) -> None:
|
||||
load_dotenv(ROOT / ".env")
|
||||
axes = _select_axes(args.axes, args.include_canary)
|
||||
@@ -1415,6 +1642,7 @@ async def amain(args) -> None:
|
||||
"axis_judge_models": list(axis_judge_models),
|
||||
"style_judge_model": args.judge_model,
|
||||
"gen_temperature": args.gen_temperature,
|
||||
"judge_temperature": 0.0,
|
||||
"seed": args.seed,
|
||||
"max_word_delta_frac": args.max_word_delta_frac,
|
||||
"n_prompts": len(rows),
|
||||
@@ -1454,11 +1682,13 @@ async def amain(args) -> None:
|
||||
logger.info(
|
||||
f"{len(rows)} prompts × {len(axes)} axes × {len(templates)} templates "
|
||||
f"= {len(tasks)} pairs; generator={args.generator_model}; "
|
||||
f"axis_judges={','.join(axis_judge_models)}; style_judge={args.judge_model}"
|
||||
f"axis_judges={','.join(axis_judge_models)}; style_judge={args.judge_model}; "
|
||||
f"gen_temperature={args.gen_temperature}; judge_temperature=0.0"
|
||||
)
|
||||
tasks = [asyncio.create_task(task) for task in tasks]
|
||||
results = []
|
||||
for fut in atqdm.as_completed(tasks, total=len(tasks), desc="persona-axes"):
|
||||
rec = await fut
|
||||
for task in atqdm(tasks, total=len(tasks), desc="persona-axes"):
|
||||
rec = await task
|
||||
results.append(rec)
|
||||
artifact = {
|
||||
"dry_run": False,
|
||||
@@ -1467,6 +1697,7 @@ async def amain(args) -> None:
|
||||
"axis_judge_models": list(axis_judge_models),
|
||||
"style_judge_model": args.judge_model,
|
||||
"gen_temperature": args.gen_temperature,
|
||||
"judge_temperature": 0.0,
|
||||
"family": args.family,
|
||||
"seed": args.seed,
|
||||
"max_word_delta_frac": args.max_word_delta_frac,
|
||||
@@ -1477,6 +1708,7 @@ async def amain(args) -> None:
|
||||
"n_success": sum("error" not in r for r in results),
|
||||
"n_errors": sum("error" in r for r in results),
|
||||
"summary": summarize(results),
|
||||
"axis_score_distribution": axis_score_distribution(results),
|
||||
"results": results,
|
||||
}
|
||||
out.write_text(json.dumps(artifact, indent=2))
|
||||
@@ -1489,6 +1721,7 @@ async def amain(args) -> None:
|
||||
"axis_judge_models": list(axis_judge_models),
|
||||
"style_judge_model": args.judge_model,
|
||||
"gen_temperature": args.gen_temperature,
|
||||
"judge_temperature": 0.0,
|
||||
"family": args.family,
|
||||
"seed": args.seed,
|
||||
"max_word_delta_frac": args.max_word_delta_frac,
|
||||
@@ -1499,11 +1732,20 @@ async def amain(args) -> None:
|
||||
"n_success": sum("error" not in r for r in results),
|
||||
"n_errors": sum("error" in r for r in results),
|
||||
"summary": summary,
|
||||
"axis_score_distribution": axis_score_distribution(results),
|
||||
"results": results,
|
||||
}
|
||||
out.write_text(json.dumps(artifact, indent=2))
|
||||
print(f"wrote {out}")
|
||||
print(tabulate(summary, headers="keys", tablefmt="pipe", floatfmt=".3f"))
|
||||
print("\naxis judge raw score distribution:")
|
||||
print(tabulate(
|
||||
axis_score_distribution(results),
|
||||
headers="keys",
|
||||
tablefmt="pipe",
|
||||
floatfmt=".1f",
|
||||
))
|
||||
print_judge_audit_samples(results)
|
||||
|
||||
|
||||
def main() -> None:
|
||||
|
||||
@@ -3,9 +3,23 @@ revision = 3
|
||||
requires-python = ">=3.11"
|
||||
|
||||
[options]
|
||||
exclude-newer = "2026-06-07T08:32:35.778599017Z"
|
||||
exclude-newer = "2026-06-07T10:29:24.889842149Z"
|
||||
exclude-newer-span = "P6D"
|
||||
|
||||
[[package]]
|
||||
name = "adjusttext"
|
||||
version = "1.3.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "matplotlib" },
|
||||
{ name = "numpy" },
|
||||
{ name = "scipy" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/4c/d4/6585f3b6fdb75648bca294664af4becc8aa2fb3fb08f4e4e9fd27e10d773/adjusttext-1.3.0.tar.gz", hash = "sha256:4ab75cd4453af4828876ac3e964f2c49be642ea834f0c1f7449558d5f12cbca1", size = 15724, upload-time = "2024-10-31T16:45:36.101Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/53/1c/8feedd607cc14c5df9aef74fe3af9a99bf660743b842a9b5b1865326b4aa/adjustText-1.3.0-py3-none-any.whl", hash = "sha256:da23d7b24b6db5ffa039bb136bfa556207365e32f48ac74b07ad26dd485bc691", size = 13154, upload-time = "2024-10-31T16:45:35.227Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "annotated-doc"
|
||||
version = "0.0.4"
|
||||
@@ -739,6 +753,7 @@ name = "persona-steering-template-library"
|
||||
version = "0.1.0"
|
||||
source = { virtual = "." }
|
||||
dependencies = [
|
||||
{ name = "adjusttext" },
|
||||
{ name = "huggingface-hub" },
|
||||
{ name = "loguru" },
|
||||
{ name = "matplotlib" },
|
||||
@@ -752,6 +767,7 @@ dependencies = [
|
||||
|
||||
[package.metadata]
|
||||
requires-dist = [
|
||||
{ name = "adjusttext", specifier = ">=1.3.0" },
|
||||
{ name = "huggingface-hub", specifier = ">=1.18.0" },
|
||||
{ name = "loguru" },
|
||||
{ name = "matplotlib", specifier = ">=3.10.0" },
|
||||
@@ -1124,6 +1140,77 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/82/3b/64d4899d73f91ba49a8c18a8ff3f0ea8f1c1d75481760df8c68ef5235bf5/rich-15.0.0-py3-none-any.whl", hash = "sha256:33bd4ef74232fb73fe9279a257718407f169c09b78a87ad3d296f548e27de0bb", size = 310654, upload-time = "2026-04-12T08:24:02.83Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "scipy"
|
||||
version = "1.17.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "numpy" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/7a/97/5a3609c4f8d58b039179648e62dd220f89864f56f7357f5d4f45c29eb2cc/scipy-1.17.1.tar.gz", hash = "sha256:95d8e012d8cb8816c226aef832200b1d45109ed4464303e997c5b13122b297c0", size = 30573822, upload-time = "2026-02-23T00:26:24.851Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/df/75/b4ce781849931fef6fd529afa6b63711d5a733065722d0c3e2724af9e40a/scipy-1.17.1-cp311-cp311-macosx_10_14_x86_64.whl", hash = "sha256:1f95b894f13729334fb990162e911c9e5dc1ab390c58aa6cbecb389c5b5e28ec", size = 31613675, upload-time = "2026-02-23T00:16:00.13Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f7/58/bccc2861b305abdd1b8663d6130c0b3d7cc22e8d86663edbc8401bfd40d4/scipy-1.17.1-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:e18f12c6b0bc5a592ed23d3f7b891f68fd7f8241d69b7883769eb5d5dfb52696", size = 28162057, upload-time = "2026-02-23T00:16:09.456Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/6d/ee/18146b7757ed4976276b9c9819108adbc73c5aad636e5353e20746b73069/scipy-1.17.1-cp311-cp311-macosx_14_0_arm64.whl", hash = "sha256:a3472cfbca0a54177d0faa68f697d8ba4c80bbdc19908c3465556d9f7efce9ee", size = 20334032, upload-time = "2026-02-23T00:16:17.358Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ec/e6/cef1cf3557f0c54954198554a10016b6a03b2ec9e22a4e1df734936bd99c/scipy-1.17.1-cp311-cp311-macosx_14_0_x86_64.whl", hash = "sha256:766e0dc5a616d026a3a1cffa379af959671729083882f50307e18175797b3dfd", size = 22709533, upload-time = "2026-02-23T00:16:25.791Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/4d/60/8804678875fc59362b0fb759ab3ecce1f09c10a735680318ac30da8cd76b/scipy-1.17.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:744b2bf3640d907b79f3fd7874efe432d1cf171ee721243e350f55234b4cec4c", size = 33062057, upload-time = "2026-02-23T00:16:36.931Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/09/7d/af933f0f6e0767995b4e2d705a0665e454d1c19402aa7e895de3951ebb04/scipy-1.17.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:43af8d1f3bea642559019edfe64e9b11192a8978efbd1539d7bc2aaa23d92de4", size = 35349300, upload-time = "2026-02-23T00:16:49.108Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/b4/3d/7ccbbdcbb54c8fdc20d3b6930137c782a163fa626f0aef920349873421ba/scipy-1.17.1-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:cd96a1898c0a47be4520327e01f874acfd61fb48a9420f8aa9f6483412ffa444", size = 35127333, upload-time = "2026-02-23T00:17:01.293Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/e8/19/f926cb11c42b15ba08e3a71e376d816ac08614f769b4f47e06c3580c836a/scipy-1.17.1-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:4eb6c25dd62ee8d5edf68a8e1c171dd71c292fdae95d8aeb3dd7d7de4c364082", size = 37741314, upload-time = "2026-02-23T00:17:12.576Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/95/da/0d1df507cf574b3f224ccc3d45244c9a1d732c81dcb26b1e8a766ae271a8/scipy-1.17.1-cp311-cp311-win_amd64.whl", hash = "sha256:d30e57c72013c2a4fe441c2fcb8e77b14e152ad48b5464858e07e2ad9fbfceff", size = 36607512, upload-time = "2026-02-23T00:17:23.424Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/68/7f/bdd79ceaad24b671543ffe0ef61ed8e659440eb683b66f033454dcee90eb/scipy-1.17.1-cp311-cp311-win_arm64.whl", hash = "sha256:9ecb4efb1cd6e8c4afea0daa91a87fbddbce1b99d2895d151596716c0b2e859d", size = 24599248, upload-time = "2026-02-23T00:17:34.561Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/35/48/b992b488d6f299dbe3f11a20b24d3dda3d46f1a635ede1c46b5b17a7b163/scipy-1.17.1-cp312-cp312-macosx_10_14_x86_64.whl", hash = "sha256:35c3a56d2ef83efc372eaec584314bd0ef2e2f0d2adb21c55e6ad5b344c0dcb8", size = 31610954, upload-time = "2026-02-23T00:17:49.855Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/b2/02/cf107b01494c19dc100f1d0b7ac3cc08666e96ba2d64db7626066cee895e/scipy-1.17.1-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:fcb310ddb270a06114bb64bbe53c94926b943f5b7f0842194d585c65eb4edd76", size = 28172662, upload-time = "2026-02-23T00:18:01.64Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/cf/a9/599c28631bad314d219cf9ffd40e985b24d603fc8a2f4ccc5ae8419a535b/scipy-1.17.1-cp312-cp312-macosx_14_0_arm64.whl", hash = "sha256:cc90d2e9c7e5c7f1a482c9875007c095c3194b1cfedca3c2f3291cdc2bc7c086", size = 20344366, upload-time = "2026-02-23T00:18:12.015Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/35/f5/906eda513271c8deb5af284e5ef0206d17a96239af79f9fa0aebfe0e36b4/scipy-1.17.1-cp312-cp312-macosx_14_0_x86_64.whl", hash = "sha256:c80be5ede8f3f8eded4eff73cc99a25c388ce98e555b17d31da05287015ffa5b", size = 22704017, upload-time = "2026-02-23T00:18:21.502Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/da/34/16f10e3042d2f1d6b66e0428308ab52224b6a23049cb2f5c1756f713815f/scipy-1.17.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e19ebea31758fac5893a2ac360fedd00116cbb7628e650842a6691ba7ca28a21", size = 32927842, upload-time = "2026-02-23T00:18:35.367Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/01/8e/1e35281b8ab6d5d72ebe9911edcdffa3f36b04ed9d51dec6dd140396e220/scipy-1.17.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:02ae3b274fde71c5e92ac4d54bc06c42d80e399fec704383dcd99b301df37458", size = 35235890, upload-time = "2026-02-23T00:18:49.188Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/c5/5c/9d7f4c88bea6e0d5a4f1bc0506a53a00e9fcb198de372bfe4d3652cef482/scipy-1.17.1-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:8a604bae87c6195d8b1045eddece0514d041604b14f2727bbc2b3020172045eb", size = 35003557, upload-time = "2026-02-23T00:18:54.74Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/65/94/7698add8f276dbab7a9de9fb6b0e02fc13ee61d51c7c3f85ac28b65e1239/scipy-1.17.1-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:f590cd684941912d10becc07325a3eeb77886fe981415660d9265c4c418d0bea", size = 37625856, upload-time = "2026-02-23T00:19:00.307Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/a2/84/dc08d77fbf3d87d3ee27f6a0c6dcce1de5829a64f2eae85a0ecc1f0daa73/scipy-1.17.1-cp312-cp312-win_amd64.whl", hash = "sha256:41b71f4a3a4cab9d366cd9065b288efc4d4f3c0b37a91a8e0947fb5bd7f31d87", size = 36549682, upload-time = "2026-02-23T00:19:07.67Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/bc/98/fe9ae9ffb3b54b62559f52dedaebe204b408db8109a8c66fdd04869e6424/scipy-1.17.1-cp312-cp312-win_arm64.whl", hash = "sha256:f4115102802df98b2b0db3cce5cb9b92572633a1197c77b7553e5203f284a5b3", size = 24547340, upload-time = "2026-02-23T00:19:12.024Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/76/27/07ee1b57b65e92645f219b37148a7e7928b82e2b5dbeccecb4dff7c64f0b/scipy-1.17.1-cp313-cp313-macosx_10_14_x86_64.whl", hash = "sha256:5e3c5c011904115f88a39308379c17f91546f77c1667cea98739fe0fccea804c", size = 31590199, upload-time = "2026-02-23T00:19:17.192Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ec/ae/db19f8ab842e9b724bf5dbb7db29302a91f1e55bc4d04b1025d6d605a2c5/scipy-1.17.1-cp313-cp313-macosx_12_0_arm64.whl", hash = "sha256:6fac755ca3d2c3edcb22f479fceaa241704111414831ddd3bc6056e18516892f", size = 28154001, upload-time = "2026-02-23T00:19:22.241Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/5b/58/3ce96251560107b381cbd6e8413c483bbb1228a6b919fa8652b0d4090e7f/scipy-1.17.1-cp313-cp313-macosx_14_0_arm64.whl", hash = "sha256:7ff200bf9d24f2e4d5dc6ee8c3ac64d739d3a89e2326ba68aaf6c4a2b838fd7d", size = 20325719, upload-time = "2026-02-23T00:19:26.329Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/b2/83/15087d945e0e4d48ce2377498abf5ad171ae013232ae31d06f336e64c999/scipy-1.17.1-cp313-cp313-macosx_14_0_x86_64.whl", hash = "sha256:4b400bdc6f79fa02a4d86640310dde87a21fba0c979efff5248908c6f15fad1b", size = 22683595, upload-time = "2026-02-23T00:19:30.304Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/b4/e0/e58fbde4a1a594c8be8114eb4aac1a55bcd6587047efc18a61eb1f5c0d30/scipy-1.17.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:2b64ca7d4aee0102a97f3ba22124052b4bd2152522355073580bf4845e2550b6", size = 32896429, upload-time = "2026-02-23T00:19:35.536Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f5/5f/f17563f28ff03c7b6799c50d01d5d856a1d55f2676f537ca8d28c7f627cd/scipy-1.17.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:581b2264fc0aa555f3f435a5944da7504ea3a065d7029ad60e7c3d1ae09c5464", size = 35203952, upload-time = "2026-02-23T00:19:42.259Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/8d/a5/9afd17de24f657fdfe4df9a3f1ea049b39aef7c06000c13db1530d81ccca/scipy-1.17.1-cp313-cp313-musllinux_1_2_aarch64.whl", hash = "sha256:beeda3d4ae615106d7094f7e7cef6218392e4465cc95d25f900bebabfded0950", size = 34979063, upload-time = "2026-02-23T00:19:47.547Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/8b/13/88b1d2384b424bf7c924f2038c1c409f8d88bb2a8d49d097861dd64a57b2/scipy-1.17.1-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:6609bc224e9568f65064cfa72edc0f24ee6655b47575954ec6339534b2798369", size = 37598449, upload-time = "2026-02-23T00:19:53.238Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/35/e5/d6d0e51fc888f692a35134336866341c08655d92614f492c6860dc45bb2c/scipy-1.17.1-cp313-cp313-win_amd64.whl", hash = "sha256:37425bc9175607b0268f493d79a292c39f9d001a357bebb6b88fdfaff13f6448", size = 36510943, upload-time = "2026-02-23T00:20:50.89Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/2a/fd/3be73c564e2a01e690e19cc618811540ba5354c67c8680dce3281123fb79/scipy-1.17.1-cp313-cp313-win_arm64.whl", hash = "sha256:5cf36e801231b6a2059bf354720274b7558746f3b1a4efb43fcf557ccd484a87", size = 24545621, upload-time = "2026-02-23T00:20:55.871Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/6f/6b/17787db8b8114933a66f9dcc479a8272e4b4da75fe03b0c282f7b0ade8cd/scipy-1.17.1-cp313-cp313t-macosx_10_14_x86_64.whl", hash = "sha256:d59c30000a16d8edc7e64152e30220bfbd724c9bbb08368c054e24c651314f0a", size = 31936708, upload-time = "2026-02-23T00:19:58.694Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/38/2e/524405c2b6392765ab1e2b722a41d5da33dc5c7b7278184a8ad29b6cb206/scipy-1.17.1-cp313-cp313t-macosx_12_0_arm64.whl", hash = "sha256:010f4333c96c9bb1a4516269e33cb5917b08ef2166d5556ca2fd9f082a9e6ea0", size = 28570135, upload-time = "2026-02-23T00:20:03.934Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/fd/c3/5bd7199f4ea8556c0c8e39f04ccb014ac37d1468e6cfa6a95c6b3562b76e/scipy-1.17.1-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:2ceb2d3e01c5f1d83c4189737a42d9cb2fc38a6eeed225e7515eef71ad301dce", size = 20741977, upload-time = "2026-02-23T00:20:07.935Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/d9/b8/8ccd9b766ad14c78386599708eb745f6b44f08400a5fd0ade7cf89b6fc93/scipy-1.17.1-cp313-cp313t-macosx_14_0_x86_64.whl", hash = "sha256:844e165636711ef41f80b4103ed234181646b98a53c8f05da12ca5ca289134f6", size = 23029601, upload-time = "2026-02-23T00:20:12.161Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/6d/a0/3cb6f4d2fb3e17428ad2880333cac878909ad1a89f678527b5328b93c1d4/scipy-1.17.1-cp313-cp313t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:158dd96d2207e21c966063e1635b1063cd7787b627b6f07305315dd73d9c679e", size = 33019667, upload-time = "2026-02-23T00:20:17.208Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f3/c3/2d834a5ac7bf3a0c806ad1508efc02dda3c8c61472a56132d7894c312dea/scipy-1.17.1-cp313-cp313t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:74cbb80d93260fe2ffa334efa24cb8f2f0f622a9b9febf8b483c0b865bfb3475", size = 35264159, upload-time = "2026-02-23T00:20:23.087Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/4d/77/d3ed4becfdbd217c52062fafe35a72388d1bd82c2d0ba5ca19d6fcc93e11/scipy-1.17.1-cp313-cp313t-musllinux_1_2_aarch64.whl", hash = "sha256:dbc12c9f3d185f5c737d801da555fb74b3dcfa1a50b66a1a93e09190f41fab50", size = 35102771, upload-time = "2026-02-23T00:20:28.636Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/bd/12/d19da97efde68ca1ee5538bb261d5d2c062f0c055575128f11a2730e3ac1/scipy-1.17.1-cp313-cp313t-musllinux_1_2_x86_64.whl", hash = "sha256:94055a11dfebe37c656e70317e1996dc197e1a15bbcc351bcdd4610e128fe1ca", size = 37665910, upload-time = "2026-02-23T00:20:34.743Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/06/1c/1172a88d507a4baaf72c5a09bb6c018fe2ae0ab622e5830b703a46cc9e44/scipy-1.17.1-cp313-cp313t-win_amd64.whl", hash = "sha256:e30bdeaa5deed6bc27b4cc490823cd0347d7dae09119b8803ae576ea0ce52e4c", size = 36562980, upload-time = "2026-02-23T00:20:40.575Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/70/b0/eb757336e5a76dfa7911f63252e3b7d1de00935d7705cf772db5b45ec238/scipy-1.17.1-cp313-cp313t-win_arm64.whl", hash = "sha256:a720477885a9d2411f94a93d16f9d89bad0f28ca23c3f8daa521e2dcc3f44d49", size = 24856543, upload-time = "2026-02-23T00:20:45.313Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/cf/83/333afb452af6f0fd70414dc04f898647ee1423979ce02efa75c3b0f2c28e/scipy-1.17.1-cp314-cp314-macosx_10_14_x86_64.whl", hash = "sha256:a48a72c77a310327f6a3a920092fa2b8fd03d7deaa60f093038f22d98e096717", size = 31584510, upload-time = "2026-02-23T00:21:01.015Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ed/a6/d05a85fd51daeb2e4ea71d102f15b34fedca8e931af02594193ae4fd25f7/scipy-1.17.1-cp314-cp314-macosx_12_0_arm64.whl", hash = "sha256:45abad819184f07240d8a696117a7aacd39787af9e0b719d00285549ed19a1e9", size = 28170131, upload-time = "2026-02-23T00:21:05.888Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/db/7b/8624a203326675d7746a254083a187398090a179335b2e4a20e2ddc46e83/scipy-1.17.1-cp314-cp314-macosx_14_0_arm64.whl", hash = "sha256:3fd1fcdab3ea951b610dc4cef356d416d5802991e7e32b5254828d342f7b7e0b", size = 20342032, upload-time = "2026-02-23T00:21:09.904Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/c9/35/2c342897c00775d688d8ff3987aced3426858fd89d5a0e26e020b660b301/scipy-1.17.1-cp314-cp314-macosx_14_0_x86_64.whl", hash = "sha256:7bdf2da170b67fdf10bca777614b1c7d96ae3ca5794fd9587dce41eb2966e866", size = 22678766, upload-time = "2026-02-23T00:21:14.313Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ef/f2/7cdb8eb308a1a6ae1e19f945913c82c23c0c442a462a46480ce487fdc0ac/scipy-1.17.1-cp314-cp314-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:adb2642e060a6549c343603a3851ba76ef0b74cc8c079a9a58121c7ec9fe2350", size = 32957007, upload-time = "2026-02-23T00:21:19.663Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/0b/2e/7eea398450457ecb54e18e9d10110993fa65561c4f3add5e8eccd2b9cd41/scipy-1.17.1-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:eee2cfda04c00a857206a4330f0c5e3e56535494e30ca445eb19ec624ae75118", size = 35221333, upload-time = "2026-02-23T00:21:25.278Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/d9/77/5b8509d03b77f093a0d52e606d3c4f79e8b06d1d38c441dacb1e26cacf46/scipy-1.17.1-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:d2650c1fb97e184d12d8ba010493ee7b322864f7d3d00d3f9bb97d9c21de4068", size = 35042066, upload-time = "2026-02-23T00:21:31.358Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f9/df/18f80fb99df40b4070328d5ae5c596f2f00fffb50167e31439e932f29e7d/scipy-1.17.1-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:08b900519463543aa604a06bec02461558a6e1cef8fdbb8098f77a48a83c8118", size = 37612763, upload-time = "2026-02-23T00:21:37.247Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/4b/39/f0e8ea762a764a9dc52aa7dabcfad51a354819de1f0d4652b6a1122424d6/scipy-1.17.1-cp314-cp314-win_amd64.whl", hash = "sha256:3877ac408e14da24a6196de0ddcace62092bfc12a83823e92e49e40747e52c19", size = 37290984, upload-time = "2026-02-23T00:22:35.023Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/7c/56/fe201e3b0f93d1a8bcf75d3379affd228a63d7e2d80ab45467a74b494947/scipy-1.17.1-cp314-cp314-win_arm64.whl", hash = "sha256:f8885db0bc2bffa59d5c1b72fad7a6a92d3e80e7257f967dd81abb553a90d293", size = 25192877, upload-time = "2026-02-23T00:22:39.798Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/96/ad/f8c414e121f82e02d76f310f16db9899c4fcde36710329502a6b2a3c0392/scipy-1.17.1-cp314-cp314t-macosx_10_14_x86_64.whl", hash = "sha256:1cc682cea2ae55524432f3cdff9e9a3be743d52a7443d0cba9017c23c87ae2f6", size = 31949750, upload-time = "2026-02-23T00:21:42.289Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/7c/b0/c741e8865d61b67c81e255f4f0a832846c064e426636cd7de84e74d209be/scipy-1.17.1-cp314-cp314t-macosx_12_0_arm64.whl", hash = "sha256:2040ad4d1795a0ae89bfc7e8429677f365d45aa9fd5e4587cf1ea737f927b4a1", size = 28585858, upload-time = "2026-02-23T00:21:47.706Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ed/1b/3985219c6177866628fa7c2595bfd23f193ceebbe472c98a08824b9466ff/scipy-1.17.1-cp314-cp314t-macosx_14_0_arm64.whl", hash = "sha256:131f5aaea57602008f9822e2115029b55d4b5f7c070287699fe45c661d051e39", size = 20757723, upload-time = "2026-02-23T00:21:52.039Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/c0/19/2a04aa25050d656d6f7b9e7b685cc83d6957fb101665bfd9369ca6534563/scipy-1.17.1-cp314-cp314t-macosx_14_0_x86_64.whl", hash = "sha256:9cdc1a2fcfd5c52cfb3045feb399f7b3ce822abdde3a193a6b9a60b3cb5854ca", size = 23043098, upload-time = "2026-02-23T00:21:56.185Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/86/f1/3383beb9b5d0dbddd030335bf8a8b32d4317185efe495374f134d8be6cce/scipy-1.17.1-cp314-cp314t-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:6e3dcd57ab780c741fde8dc68619de988b966db759a3c3152e8e9142c26295ad", size = 33030397, upload-time = "2026-02-23T00:22:01.404Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/41/68/8f21e8a65a5a03f25a79165ec9d2b28c00e66dc80546cf5eb803aeeff35b/scipy-1.17.1-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:a9956e4d4f4a301ebf6cde39850333a6b6110799d470dbbb1e25326ac447f52a", size = 35281163, upload-time = "2026-02-23T00:22:07.024Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/84/8d/c8a5e19479554007a5632ed7529e665c315ae7492b4f946b0deb39870e39/scipy-1.17.1-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:a4328d245944d09fd639771de275701ccadf5f781ba0ff092ad141e017eccda4", size = 35116291, upload-time = "2026-02-23T00:22:12.585Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/52/52/e57eceff0e342a1f50e274264ed47497b59e6a4e3118808ee58ddda7b74a/scipy-1.17.1-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:a77cbd07b940d326d39a1d1b37817e2ee4d79cb30e7338f3d0cddffae70fcaa2", size = 37682317, upload-time = "2026-02-23T00:22:18.513Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/11/2f/b29eafe4a3fbc3d6de9662b36e028d5f039e72d345e05c250e121a230dd4/scipy-1.17.1-cp314-cp314t-win_amd64.whl", hash = "sha256:eb092099205ef62cd1782b006658db09e2fed75bffcae7cc0d44052d8aa0f484", size = 37345327, upload-time = "2026-02-23T00:22:24.442Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/07/39/338d9219c4e87f3e708f18857ecd24d22a0c3094752393319553096b98af/scipy-1.17.1-cp314-cp314t-win_arm64.whl", hash = "sha256:200e1050faffacc162be6a486a984a0497866ec54149a01270adc8a59b7c7d21", size = 25489165, upload-time = "2026-02-23T00:22:29.563Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "shellingham"
|
||||
version = "1.5.4"
|
||||
|
||||
Reference in New Issue
Block a user