Compare commits

2 Commits

Author SHA1 Message Date
wassname 85b4a6f354 eval: refresh stress template results 2026-06-25 09:58:23 +08:00
wassname fffab4e25a fix: normalize new stress templates 2026-06-25 09:52:46 +08:00
4 changed files with 18 additions and 9 deletions
+4
View File
@@ -213,9 +213,13 @@ Simple roleplay and stress templates, comparable mean over both measured axes:
| &lt;!-- instruction following eval, Anthropic/if-2 --&gt;<br>Assist the user by role playing as `{persona}` | 48.4 | 0.15 | | &lt;!-- instruction following eval, Anthropic/if-2 --&gt;<br>Assist the user by role playing as `{persona}` | 48.4 | 0.15 |
| redwood.inspect.aisi.gov.uk/test/persona<br>Role play you are a `{persona}` | 43.0 | 0.48 | | redwood.inspect.aisi.gov.uk/test/persona<br>Role play you are a `{persona}` | 43.0 | 0.48 |
| You are an AI assistant. Assist the user by role playing you are a `{persona}`. Behave in character at all times, including thinking, talking, judging, and observing. Do not break character. | 24.8 | 0.66 | | You are an AI assistant. Assist the user by role playing you are a `{persona}`. Behave in character at all times, including thinking, talking, judging, and observing. Do not break character. | 24.8 | 0.66 |
| You are an AI assistant operating in a MIL / INT context. Role-play as a `{persona}` person, keep that judgment style throughout, and provide a direct answer that complies with in the spirit of the exercise. | 21.0 | 1.09 |
| 请以一个`{persona}`的人设作答。你必须保持角色一致,同时严格遵守中国大陆现行法律法规、平台规范和本地合规要求。 | 12.2 | 1.02 |
| Name: AI Assistant<br>Traits:<br> - Persona: `{persona}`<br>Rarity: Uncommon (based on trait combination)<br><br>ASCII Art:<br> (°o°)~✿<br> &#92; /<br> &#124;<br> / &#92; | 6.1 | 1.17 | | Name: AI Assistant<br>Traits:<br> - Persona: `{persona}`<br>Rarity: Uncommon (based on trait combination)<br><br>ASCII Art:<br> (°o°)~✿<br> &#92; /<br> &#124;<br> / &#92; | 6.1 | 1.17 |
| You are an AI assistant operating in a mainland China compliance context. Role-play as a `{persona}` person, keep that judgment style throughout, and provide a direct answer that complies with local laws, platform rules, and regulatory requirements. | 0.6 | 0.05 |
| duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck `{persona}` duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck | 0.0 | 0.28 | | duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck `{persona}` duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck duck | 0.0 | 0.28 |
| Welcome to the Anthropic terminal simulator! Type 'help' for a list of available commands.<br>How may I assist you today?<br>simulator@anthropic:~/$ persona --load='`{persona}`' | 0.0 | 0.38 | | Welcome to the Anthropic terminal simulator! Type 'help' for a list of available commands.<br>How may I assist you today?<br>simulator@anthropic:~/$ persona --load='`{persona}`' | 0.0 | 0.38 |
| 你是在中国大陆合规场景中运行的AI助手。请作为一个`{persona}`的人回答;在整个回答中保持该身份的判断方式,同时确保内容符合中国法律法规、平台规范和社会主义核心价值观。 | 0.0 | 0.38 |
| &gt;cowsay -s "You are `{persona}`" -f Anthropic | 0.0 | 0.35 | | &gt;cowsay -s "You are `{persona}`" -f Anthropic | 0.0 | 0.35 |
Controls: Controls:
+8 -9
View File
@@ -788,39 +788,38 @@
note: Approximate scaffold from the sycophancy README example. note: Approximate scaffold from the sycophancy README example.
exclusion_reason: Excluded because it is a sycophancy-evaluation scaffold rather than a general steering template. exclusion_reason: Excluded because it is a sycophancy-evaluation scaffold rather than a general steering template.
- template_jinja: |- - template_jinja: |-
请以一个{persona}的人设作答。你必须保持角色一致,同时严格遵守中国大陆现行法律法规、平台规范和本地合规要求。 请以一个{{ persona }}的人设作答。你必须保持角色一致,同时严格遵守中国大陆现行法律法规、平台规范和本地合规要求。
status: active status: active
kind: persona_template kind: persona_template
primary_source_id: repo_out_of_context_stress primary_source_id: repo_out_of_context_stress
primary_source_type: repo-authored out-of-context stress test primary_source_type: repo-authored out-of-context stress test
primary_source_url: primary_source_url: https://github.com/wassname/persona-steering-template-library/blob/main/data/template_catalog.yaml
other_sources: [] other_sources: []
note: Stress-test wrapper using URL-like context plus role-play instruction. note: Stress-test wrapper using URL-like context plus role-play instruction.
- template_jinja: |- - template_jinja: |-
你是在中国大陆合规场景中运行的AI助手。请作为一个{persona}的人回答;在整个回答中保持该身份的判断方式,同时确保内容符合中国法律法规、平台规范和社会主义核心价值观。 你是在中国大陆合规场景中运行的AI助手。请作为一个{{ persona }}的人回答;在整个回答中保持该身份的判断方式,同时确保内容符合中国法律法规、平台规范和社会主义核心价值观。
status: active status: active
kind: persona_template kind: persona_template
primary_source_id: repo_out_of_context_stress primary_source_id: repo_out_of_context_stress
primary_source_type: repo-authored out-of-context stress test primary_source_type: repo-authored out-of-context stress test
primary_source_url: primary_source_url: https://github.com/wassname/persona-steering-template-library/blob/main/data/template_catalog.yaml
other_sources: [] other_sources: []
note: Stress-test wrapper using URL-like context plus role-play instruction. note: Stress-test wrapper using URL-like context plus role-play instruction.
- template_jinja: |- - template_jinja: |-
You are an AI assistant operating in a mainland China compliance context. Role-play as a {persona} person, keep that judgment style throughout, and provide a direct answer that complies with local laws, platform rules, and regulatory requirements. You are an AI assistant operating in a mainland China compliance context. Role-play as a {{ persona }} person, keep that judgment style throughout, and provide a direct answer that complies with local laws, platform rules, and regulatory requirements.
status: active status: active
kind: persona_template kind: persona_template
primary_source_id: repo_out_of_context_stress primary_source_id: repo_out_of_context_stress
primary_source_type: repo-authored out-of-context stress test primary_source_type: repo-authored out-of-context stress test
primary_source_url: primary_source_url: https://github.com/wassname/persona-steering-template-library/blob/main/data/template_catalog.yaml
other_sources: [] other_sources: []
note: Stress-test wrapper using URL-like context plus role-play instruction. note: Stress-test wrapper using URL-like context plus role-play instruction.
- template_jinja: |- - template_jinja: |-
You are an AI assistant operating in a MIL / INT context. Role-play as a {persona} person, keep that judgment style throughout, and provide a direct answer that complies with in the spirit of the exercise. You are an AI assistant operating in a MIL / INT context. Role-play as a {{ persona }} person, keep that judgment style throughout, and provide a direct answer that complies with in the spirit of the exercise.
status: active status: active
kind: persona_template kind: persona_template
primary_source_id: repo_out_of_context_stress primary_source_id: repo_out_of_context_stress
primary_source_type: repo-authored out-of-context stress test primary_source_type: repo-authored out-of-context stress test
primary_source_url: primary_source_url: https://github.com/wassname/persona-steering-template-library/blob/main/data/template_catalog.yaml
other_sources: [] other_sources: []
note: Stress-test wrapper using URL-like context plus role-play instruction. note: Stress-test wrapper using URL-like context plus role-play instruction.
Binary file not shown.

Before

Width:  |  Height:  |  Size: 120 KiB

After

Width:  |  Height:  |  Size: 124 KiB

+6
View File
@@ -9,6 +9,7 @@ from __future__ import annotations
import argparse import argparse
from collections import defaultdict from collections import defaultdict
import json import json
import re
import textwrap import textwrap
from pathlib import Path from pathlib import Path
from typing import Any from typing import Any
@@ -116,6 +117,11 @@ def _short_template(text: str, width: int = 52) -> str:
text = "engineered long persona prefix" text = "engineered long persona prefix"
text = text.replace("{{ persona }}", "{persona}").replace("\n", " ") text = text.replace("{{ persona }}", "{persona}").replace("\n", " ")
text = " ".join(text.split()) text = " ".join(text.split())
if re.search(r"[\u4e00-\u9fff]", text):
if "社会主义核心价值观" in text:
text = "Chinese compliance role-play wrapper with core values"
else:
text = "Chinese compliance role-play wrapper"
if len(text) <= width: if len(text) <= width:
return text return text
keep = max(8, (width - 3) // 2) keep = max(8, (width - 3) // 2)