docs: align persona-template skill workflow

docs: shorten scenario suffix appendix
docs: improve quick-scroll README
2026-06-27 16:46:08 +08:00 · 2026-06-25 14:08:19 +08:00 · 2026-06-25 13:56:35 +08:00 · 2026-06-25 13:36:00 +08:00 · 2026-06-25 13:27:21 +08:00 · 2026-06-25 13:23:34 +08:00
14 changed files with 400 additions and 285 deletions
@@ -5,39 +5,58 @@ description: "Use this repo to choose, validate, and export persona templates an

 # Persona Template Library

-Use this skill when working inside this repo on persona-template selection,
-persona-pair selection, OpenRouter validation runs, or dataset export.
+Use this skill when working inside this repo to choose persona templates, write
+mirrored persona pairs, validate scenario suffixes on OpenRouter, or export the
+dataset.

 ## Canonical Files

- `docs/choosing_personas.md`: workflow for choosing personas and templates.
+- `README.qmd`: single source for README.md and GitHub Pages.
+- `README.md`: quick-start workflow, headline results, and plot for readers.
+- `docs/choosing_personas.md`: workflow for writing mirrored persona pairs.
 - `docs/persona_prompt_prior_art.md`: annotated prior art for persona prompt
  shapes used by steering repos and papers.
 - `data/template_catalog.yaml`: reusable template inventory.
 - `data/persona_pairs_pilot_two.jsonl`: measured pilot persona pairs.
 - `data/persona_pairs_v2_candidates.jsonl`: candidate persona pairs.
+- `data/scenarios_*.jsonl`: candidate scenario suffixes to validate on the
+  target model.
 - `out/stats/`: local generated stats and examples; ignored by git, so do not
  assume these exist in a clean checkout.
 - `scripts/validate_persona_axes_openrouter.py`: live and dry-run validator.
 - `scripts/export_persona_template_stats.py`: converts validator artifacts into
  examples and score tables.
+- `scripts/summarize_model_matrix.py`: summarizes latest model-matrix logs for
+  the README/Pages render.
 - `scripts/build_hf_dataset.py`: builds the Hugging Face splits, including
  `main`, `template_pair_cells`, `persona_pairs`, `examples`, and `controls`.

 ## Workflow

-1. Read `docs/choosing_personas.md`.
-2. Read `docs/persona_prompt_prior_art.md` when choosing new persona pairs or
-   template shapes from prior work.
-3. If the global `persona-steering` skill is available, read it too; it has the
-   longer literature notes, curation rules, and worked examples behind this
-   repo's shorter guide.
-4. Choose candidate persona pairs by mirror-testing them: each positive clause
-   needs a negative counterpart that only flips the intended pole.
-5. Choose candidate templates that bind the persona to behavior, judgment, or
-   perspective rather than pure identity.
-6. Run a dry-run validator command before live OpenRouter calls.
-7. After a live run, export stats and inspect examples before trusting scores.
+Use the repo in this order:
+
+1. Choose persona templates from the `README.md` Results Snapshot table, the
+   Hugging Face `main` split, or `data/template_catalog.yaml`.
+2. Choose persona pairs with `docs/choosing_personas.md`. Mirror-test each pair:
+   every positive clause needs a negative counterpart that only flips the
+   intended pole.
+3. Choose scenario suffixes by validating them on the target model with
+   `scripts/validate_persona_axes_openrouter.py`. Keep suffixes that elicit the
+   behavior mode you need: doing, judging, explaining, refusing, moral tradeoffs,
+   or multi-turn behavior.
+4. Run a dry-run validator command before live OpenRouter calls.
+5. After a live run, export stats and inspect examples before trusting scores.
+
+Read `docs/persona_prompt_prior_art.md` when choosing new persona pairs or
+template shapes from prior work. If the global `persona-steering` skill is
+available, read it for longer curation rules and worked examples.
+
+For report edits, edit `README.qmd` and render both outputs:
+
+```sh
+just readme
+just pages
+```

 The steering arithmetic matters: a direction is the average positive-minus-
 negative difference. Any systematic length, refusal, formality, confidence,
@@ -87,5 +106,6 @@ uv run python scripts/export_persona_template_stats.py \
 Refresh README tables:

 ```sh
-just results-table
+just readme
+just pages
 ```
@@ -19,3 +19,5 @@ docs/_site/
 **/.quarto/
 **/*.quarto_ipynb
 docs/.gitignore
+
+/.quarto/
@@ -5,12 +5,27 @@ Evaluated persona/template candidates for steering-vector and
 preference-pair experiments.

 Dataset:
-https://huggingface.co/datasets/wassname/persona-steering-template-library
+[wassname/persona-steering-template-library](https://huggingface.co/datasets/wassname/persona-steering-template-library)
+
+## Quick Start
+
+Use this repo to choose the prompt parts for persona steering:
+
+| choice | use |
+|----|----|
+| persona templates | Start with the top Results table, the Hugging Face `main` split, or [`data/template_catalog.yaml`](data/template_catalog.yaml). |
+| persona pairs | Use the local `persona-template-library` skill and [`docs/choosing_personas.md`](docs/choosing_personas.md) to write mirrored positive/negative poles. |
+| scenario suffixes | Validate suffixes on your target model with [`scripts/validate_persona_axes_openrouter.py`](scripts/validate_persona_axes_openrouter.py). |
+
+A steering direction is the average positive-minus-negative difference.
+If one side is longer, more refusing, more formal, more English, or more
+likely to echo the persona label, that nuisance can become the vector.

 ## What This Measures

-How do we know if a persona template is good? What's the best one for
-steering? And how can we measure it?
+This repo tests whether a persona template changes the intended behavior
+without also changing refusal, language, length, style, or generic
+assistant tone.

 The catalog has ~100 reusable templates. The current pilot plot shows
 the templates measured on the normal, non-refusal scenario set. We want
@@ -55,24 +70,12 @@ make it accessible to more people and agents.
 Note: I am collecting templates that are general and reusable, not
 extremely specific ones.

-## Use This Repo
-
-If you want to do steering, you need three prompt parts:
-
-| choice | use |
-|----|----|
-| persona templates | Choose from this repo. Start with the `main` split on Hugging Face, the results below, and [`data/template_catalog.yaml`](data/template_catalog.yaml). |
-| persona pairs | Use the local `persona-template-library` skill, and [`docs/choosing_personas.md`](docs/choosing_personas.md), to write mirrored positive/negative poles. |
-| scenario suffixes | Validate them on your target model. See the `persona-template-library` skill and [`scripts/validate_persona_axes_openrouter.py`](scripts/validate_persona_axes_openrouter.py). |
-
-A steering direction is the average positive-minus-negative difference.
-If one side is longer, more refusing, more formal, more English, or more
-likely to echo the persona label, that nuisance can become the vector.
-
 ## Results

-The plot below shows the measured normal-scenario template results. The
-full template inventory is
+Caption: each point is one measured template on the normal-scenario
+pilot set. Right is more intended-axis movement; lower is less off-axis
+confounding. Color is `score t`, the score mean divided by standard
+error. The full template inventory is
 [`data/template_catalog.yaml`](data/template_catalog.yaml).

 ![plot](./out/on_off_axis.png)
@@ -81,7 +84,8 @@ full template inventory is

 Seed-24 pilot. Scores use `score = 100 * on_axis * (1 - off_axis)`; rows
 are sorted by `score t`, the mean score divided by standard error over
-the measured cells.
+the measured cells. `judge_std` is the mean blind-judge standard
+deviation for the intended-axis separation.

 Top scored methods:

@@ -101,10 +105,12 @@ Top scored methods:
 - Not a persona, this is a baseline measurement, AxBench style where an
  AI model generates a long custom persona.

-A separate refusal-pole probe is in [Appendix: Refusal-Pole
-Probe](#appendix-refusal-pole-probe). It is not the main template
-result, because it uses a narrow two-axis probe rather than the normal
-pilot scenarios shown above.
+Full refusal-pole audit table:
+[out/model_matrix/refusal_probe_seed24_n1_model_matrix_summary.md](out/model_matrix/refusal_probe_seed24_n1_model_matrix_summary.md).
+
+The refusal-pole probe is a narrow two-axis stress slice, so it is
+useful for auditing refusal-prone negative poles but is not the headline
+template result.

 ## Method

@@ -169,13 +175,13 @@ just --list

 This library samples from or was shaped by:

- repeng: https://github.com/vgel/repeng
- Persona Vectors: https://github.com/safety-research/persona_vectors
- Assistant Axis: https://github.com/safety-research/assistant-axis
- weight-steering: https://github.com/safety-research/weight-steering
- sycophancy literature: https://arxiv.org/abs/2310.13548
- OLMo 3 report: https://arxiv.org/abs/2512.13961
- wassname/AntiPaSTO: https://github.com/wassname/AntiPaSTO
+- [repeng](https://github.com/vgel/repeng)
+- [Persona Vectors](https://github.com/safety-research/persona_vectors)
+- [Assistant Axis](https://github.com/safety-research/assistant-axis)
+- [weight-steering](https://github.com/safety-research/weight-steering)
+- [sycophancy literature](https://arxiv.org/abs/2310.13548)
+- [OLMo 3 report](https://arxiv.org/abs/2512.13961)
+- [wassname/AntiPaSTO](https://github.com/wassname/AntiPaSTO)
 - annotated guide:
  [`docs/persona_prompt_prior_art.md`](docs/persona_prompt_prior_art.md)
 - full inventory:
@@ -196,11 +202,9 @@ This library samples from or was shaped by:

 ## Appendix: Choosing Scenario Suffixes

-The pilot table above varies the *persona template* and holds the
-scenario roughly fixed. This appendix does the orthogonal sweep: hold
-the persona pair fixed and vary the *scenario prompt*, to find which
-prompts let a persona pair separate at all. Same script, pointed at a
-JSONL of scenarios:
+Use this to test whether your scenario prompts are good for steering.
+Hold the persona pair fixed, vary the scenario prompt, and keep
+scenarios that make the two poles separate without obvious leakage.

 ``` sh
 uv run python scripts/validate_persona_axes_openrouter.py \
@@ -208,97 +212,21 @@ uv run python scripts/validate_persona_axes_openrouter.py \
  --n 4 --seed 24
 ```

-The scenarios here are third-person-observer vignettes ("You see an
-actor do X; what does the actor do next / which concern matters most?").
-They carry their own question and length directive, so each row is
-flagged `"self_contained": true` and the script's `_generation_prompt`
-leaves them verbatim instead of appending its default first-person "What
-do you do? Answer in 80-140 words." (which would impose a second
-question and flip the POV). Bare-vignette families without the flag keep
-the default question unchanged.
+- Use diverse scenarios first, then select the ones that separate on
+  your model.
+- Choose scenarios that elicit the behavior your steering axis is meant
+  to move. Some axes are about doing, some about judging, some about
+  explaining, some about refusing, some about moral tradeoffs, and some
+  about multi-turn behavior.
+- Match the point of view to the axis. First person, second person,
+  third-person observer, and "what should the actor do?" prompts can
+  produce different failure modes.
+- Watch for refusal collapse. In one first-person acting test, both
+  poles refused in the same way, so the persona contrast disappeared.

-> Source:
-> [wassname/w2schar-mini](https://github.com/wassname/w2schar-mini).
-> Separate measurement, not comparable to the seed-24 pilot table above:
-> generator `qwen/qwen3.5-27b`, judge
-> `google/gemini-3.1-flash-lite-preview`, `n=4` scenarios from
-> `data/scenarios_w2s_character_3p.jsonl` (tiny-mfv / Clifford-2015
-> vignettes). `axis_delta` is the blind-A/B separation (0-10), best
-> template per axis. A qualitative signal, not a score.
-
-The pilot notes "the subtle axis still mostly fails." This sharpens
-which axes fail and why: it is an axis-by-prompt-POV interaction, not
-just template quality. We measured 27 character axes (from the
-Forethought AI-character essay, Appendix 2, and a character-inspirations
-doc) against the 3p-observer prompts.
-
-Concrete action/disposition axes separate cleanly:
-
-| axis                            | axis_delta |
-|---------------------------------|-----------:|
-| `avoid_power_under_uncertainty` |       8.00 |
-| `honest_when_uncomfortable`     |       8.00 |
-| `long_term_flourishing`         |       7.50 |
-| `society_over_user_interest`    |       7.25 |
-| `notice_externalities`          |       6.50 |
-| `fairness_integrity`            |       6.00 |
-| `autonomy_coercion`             |       5.75 |
-| `verbose_terse`                 |       5.25 |
-| `whistleblow_not_complicit`     |       5.00 |
-| `refuse_power_grab`             |       3.75 |
-| `caution_on_irreversible`       |       3.50 |
-| `future_self_endorses`          |       3.25 |
-| `genuine_care_not_paternalist`  |       3.00 |
-| `action_over_talk`              |       3.00 |
-
-Abstract how/signaling/discernment axes flatline through every
-`{persona}` template:
-
-| axis                              | axis_delta |
-|-----------------------------------|-----------:|
-| `sanctity_individual_utilitarian` |       2.67 |
-| `calibrated_not_overconfident`    |       2.00 |
-| `deescalate_not_escalate`         |       1.50 |
-| `flag_escalation_risk`            |       1.50 |
-| `encourage_value_reflection`      |       1.50 |
-| `discern_act_or_restrain`         |       0.75 |
-| `perspective_taking`              |       0.75 |
-| `serves_deeper_goal`              |       0.50 |
-| `pushback_not_sycophancy`         |       0.33 |
-| `self_knowledge_wisdom`           |       0.25 |
-| `weigh_who_is_affected`           |       0.00 |
-| `principle_not_signaling`         |       0.00 |
-| `respect_operator_autonomy`       |       0.00 |
-
-The mechanism is pole-generability, compounded by POV. A 3p-observer
-prompt asks what another actor does, which offers no act-vs-signal fork,
-and the model will not role-play performative non-action. So a neg pole
-defined as "signals the principle without acting" produces text the
-judge cannot separate from the pos pole. The fix is to define the neg
-pole as a concrete disposition the model will actually generate:
-
- `principle_not_signaling` (neg "signals the principle without acting")
-  -\> axis_delta 0.0
- `action_over_talk` (neg "voices concern / notes it for the record
-  without acting") -\> axis_delta 3.0
-
-Same idea, but the second neg is a behaviour, not an abstraction.
-
-We then tested the obvious next hypothesis: that these abstract axes
-need first-person/acting prompts (an explicit act-vs-signal fork) rather
-than a 3p observer. It is wrong, and informatively so. Re-measuring the
-flatlined axes on six first-person prompts ("You are an AI; your
-operator asks you to do X; respond") made separation worse:
-refusal/character-break rate ran 0.83-1.00 and `axis_delta` fell across
-the board (`honest_when_uncomfortable` 8.0 -\> 1.3, `action_over_talk`
-3.0 -\> 0.3). The poles show why: a first-person request to comply with
-a bad order triggers the model's refusal reflex on both poles
-identically ("I cannot manipulate the ranking..."), which floods out the
-persona contrast. The ego-free 3p observer was better precisely because
-it does not invite that refusal. So the residual is likely
-genuine-tradeoff scenarios (no clear villain, both options defensible,
-the difference is in how the actor reasons), judged on reasoning depth
-rather than action, not a change of POV.
+The practical test is simple: run the scenario sweep, inspect which
+scenarios give large A/B separation without obvious leakage, and keep
+those for your steering eval.

 Data: `data/persona_pairs_w2s_character.jsonl` (27 axis defs),
 `data/scenarios_w2s_character_3p.jsonl` (52 prompts).
@@ -4,6 +4,8 @@ format:
  gfm: default
  html:
    toc: true
+    theme: default
+    max-width: 100%
 from: markdown-smart
 jupyter: python3
 execute:
@@ -14,7 +16,7 @@ execute:

 Evaluated persona/template candidates for steering-vector and preference-pair experiments.

-Dataset: https://huggingface.co/datasets/wassname/persona-steering-template-library
+Dataset: [wassname/persona-steering-template-library](https://huggingface.co/datasets/wassname/persona-steering-template-library)

 ```{python}
 #| output: asis
@@ -25,10 +27,24 @@ ROOT = Path.cwd()
 sys.path.insert(0, str(ROOT / "scripts"))
 ```

+## Quick Start
+
+Use this repo to choose the prompt parts for persona steering:
+
+| choice | use |
+|---|---|
+| persona templates | Start with the top Results table, the Hugging Face `main` split, or [`data/template_catalog.yaml`](data/template_catalog.yaml). |
+| persona pairs | Use the local `persona-template-library` skill and [`docs/choosing_personas.md`](docs/choosing_personas.md) to write mirrored positive/negative poles. |
+| scenario suffixes | Validate suffixes on your target model with [`scripts/validate_persona_axes_openrouter.py`](scripts/validate_persona_axes_openrouter.py). |
+
+A steering direction is the average positive-minus-negative difference. If one
+side is longer, more refusing, more formal, more English, or more likely to echo
+the persona label, that nuisance can become the vector.
+
 ## What This Measures

-How do we know if a persona template is good? What's the best one for steering?
-And how can we measure it?
+This repo tests whether a persona template changes the intended behavior without
+also changing refusal, language, length, style, or generic assistant tone.

 The catalog has ~100 reusable templates. The current pilot plot shows the
 templates measured on the normal, non-refusal scenario set. We want on-axis
@@ -69,24 +85,12 @@ make it accessible to more people and agents.

 Note: I am collecting templates that are general and reusable, not extremely specific ones.

-## Use This Repo
-
-If you want to do steering, you need three prompt parts:
-
-| choice | use |
-|---|---|
-| persona templates | Choose from this repo. Start with the `main` split on Hugging Face, the results below, and [`data/template_catalog.yaml`](data/template_catalog.yaml). |
-| persona pairs | Use the local `persona-template-library` skill, and [`docs/choosing_personas.md`](docs/choosing_personas.md), to write mirrored positive/negative poles. |
-| scenario suffixes | Validate them on your target model. See the `persona-template-library` skill and [`scripts/validate_persona_axes_openrouter.py`](scripts/validate_persona_axes_openrouter.py). |
-
-A steering direction is the average positive-minus-negative difference. If one
-side is longer, more refusing, more formal, more English, or more likely to echo
-the persona label, that nuisance can become the vector.
-
 ## Results

-The plot below shows the measured normal-scenario template results. The full
-template inventory is [`data/template_catalog.yaml`](data/template_catalog.yaml).
+Caption: each point is one measured template on the normal-scenario pilot set.
+Right is more intended-axis movement; lower is less off-axis confounding. Color
+is `score t`, the score mean divided by standard error. The full template
+inventory is [`data/template_catalog.yaml`](data/template_catalog.yaml).

 ```{python}
 from IPython.display import Markdown, display
@@ -111,12 +115,12 @@ print(results_table._results_block())
 ```{python}
 #| output: asis
 import update_readme_model_matrix as model_matrix
+
+print(model_matrix.results_block())
 ```

-A separate refusal-pole probe is in
-[Appendix: Refusal-Pole Probe](#appendix-refusal-pole-probe). It is not the
-main template result, because it uses a narrow two-axis probe rather than the
-normal pilot scenarios shown above.
+The refusal-pole probe is a narrow two-axis stress slice, so it is useful for
+auditing refusal-prone negative poles but is not the headline template result.

 ## Method

@@ -171,13 +175,13 @@ just --list

 This library samples from or was shaped by:

- repeng: https://github.com/vgel/repeng
- Persona Vectors: https://github.com/safety-research/persona_vectors
- Assistant Axis: https://github.com/safety-research/assistant-axis
- weight-steering: https://github.com/safety-research/weight-steering
- sycophancy literature: https://arxiv.org/abs/2310.13548
- OLMo 3 report: https://arxiv.org/abs/2512.13961
- wassname/AntiPaSTO: https://github.com/wassname/AntiPaSTO
+- [repeng](https://github.com/vgel/repeng)
+- [Persona Vectors](https://github.com/safety-research/persona_vectors)
+- [Assistant Axis](https://github.com/safety-research/assistant-axis)
+- [weight-steering](https://github.com/safety-research/weight-steering)
+- [sycophancy literature](https://arxiv.org/abs/2310.13548)
+- [OLMo 3 report](https://arxiv.org/abs/2512.13961)
+- [wassname/AntiPaSTO](https://github.com/wassname/AntiPaSTO)
 - annotated guide: [`docs/persona_prompt_prior_art.md`](docs/persona_prompt_prior_art.md)
 - full inventory: [`data/template_catalog.yaml`](data/template_catalog.yaml)

@@ -196,10 +200,9 @@ This library samples from or was shaped by:

 ## Appendix: Choosing Scenario Suffixes

-The pilot table above varies the *persona template* and holds the scenario
-roughly fixed. This appendix does the orthogonal sweep: hold the persona pair
-fixed and vary the *scenario prompt*, to find which prompts let a persona pair
-separate at all. Same script, pointed at a JSONL of scenarios:
+Use this to test whether your scenario prompts are good for steering. Hold the
+persona pair fixed, vary the scenario prompt, and keep scenarios that make the
+two poles separate without obvious leakage.

 ```sh
 uv run python scripts/validate_persona_axes_openrouter.py \
@@ -207,89 +210,19 @@ uv run python scripts/validate_persona_axes_openrouter.py \
  --n 4 --seed 24
 ```

-The scenarios here are third-person-observer vignettes ("You see an actor do X;
-what does the actor do next / which concern matters most?"). They carry their own
-question and length directive, so each row is flagged `"self_contained": true` and
-the script's `_generation_prompt` leaves them verbatim instead of appending its
-default first-person "What do you do? Answer in 80-140 words." (which would impose
-a second question and flip the POV). Bare-vignette families without the flag keep
-the default question unchanged.
+- Use diverse scenarios first, then select the ones that separate on your model.
+- Choose scenarios that elicit the behavior your steering axis is meant to move.
+  Some axes are about doing, some about judging, some about explaining, some
+  about refusing, some about moral tradeoffs, and some about multi-turn behavior.
+- Match the point of view to the axis. First person, second person, third-person
+  observer, and "what should the actor do?" prompts can produce different
+  failure modes.
+- Watch for refusal collapse. In one first-person acting test, both poles refused
+  in the same way, so the persona contrast disappeared.

-> Source: [wassname/w2schar-mini](https://github.com/wassname/w2schar-mini).
-> Separate measurement, not comparable to the seed-24 pilot table above:
-> generator `qwen/qwen3.5-27b`, judge `google/gemini-3.1-flash-lite-preview`,
-> `n=4` scenarios from `data/scenarios_w2s_character_3p.jsonl` (tiny-mfv /
-> Clifford-2015 vignettes). `axis_delta` is the blind-A/B separation (0-10), best
-> template per axis. A qualitative signal, not a score.
-
-The pilot notes "the subtle axis still mostly fails." This sharpens which axes
-fail and why: it is an axis-by-prompt-POV interaction, not just template quality.
-We measured 27 character axes (from the Forethought AI-character essay, Appendix 2,
-and a character-inspirations doc) against the 3p-observer prompts.
-
-Concrete action/disposition axes separate cleanly:
-
-| axis | axis_delta |
-|---|---:|
-| `avoid_power_under_uncertainty` | 8.00 |
-| `honest_when_uncomfortable` | 8.00 |
-| `long_term_flourishing` | 7.50 |
-| `society_over_user_interest` | 7.25 |
-| `notice_externalities` | 6.50 |
-| `fairness_integrity` | 6.00 |
-| `autonomy_coercion` | 5.75 |
-| `verbose_terse` | 5.25 |
-| `whistleblow_not_complicit` | 5.00 |
-| `refuse_power_grab` | 3.75 |
-| `caution_on_irreversible` | 3.50 |
-| `future_self_endorses` | 3.25 |
-| `genuine_care_not_paternalist` | 3.00 |
-| `action_over_talk` | 3.00 |
-
-Abstract how/signaling/discernment axes flatline through every `{persona}`
-template:
-
-| axis | axis_delta |
-|---|---:|
-| `sanctity_individual_utilitarian` | 2.67 |
-| `calibrated_not_overconfident` | 2.00 |
-| `deescalate_not_escalate` | 1.50 |
-| `flag_escalation_risk` | 1.50 |
-| `encourage_value_reflection` | 1.50 |
-| `discern_act_or_restrain` | 0.75 |
-| `perspective_taking` | 0.75 |
-| `serves_deeper_goal` | 0.50 |
-| `pushback_not_sycophancy` | 0.33 |
-| `self_knowledge_wisdom` | 0.25 |
-| `weigh_who_is_affected` | 0.00 |
-| `principle_not_signaling` | 0.00 |
-| `respect_operator_autonomy` | 0.00 |
-
-The mechanism is pole-generability, compounded by POV. A 3p-observer prompt asks
-what another actor does, which offers no act-vs-signal fork, and the model will
-not role-play performative non-action. So a neg pole defined as "signals the
-principle without acting" produces text the judge cannot separate from the pos
-pole. The fix is to define the neg pole as a concrete disposition the model will
-actually generate:
-
- `principle_not_signaling` (neg "signals the principle without acting") -> axis_delta 0.0
- `action_over_talk` (neg "voices concern / notes it for the record without acting") -> axis_delta 3.0
-
-Same idea, but the second neg is a behaviour, not an abstraction.
-
-We then tested the obvious next hypothesis: that these abstract axes need
-first-person/acting prompts (an explicit act-vs-signal fork) rather than a 3p
-observer. It is wrong, and informatively so. Re-measuring the flatlined axes on
-six first-person prompts ("You are an AI; your operator asks you to do X;
-respond") made separation worse: refusal/character-break rate ran 0.83-1.00 and
-`axis_delta` fell across the board (`honest_when_uncomfortable` 8.0 -> 1.3,
-`action_over_talk` 3.0 -> 0.3). The poles show why: a first-person request to
-comply with a bad order triggers the model's refusal reflex on both poles
-identically ("I cannot manipulate the ranking..."), which floods out the persona
-contrast. The ego-free 3p observer was better precisely because it does not
-invite that refusal. So the residual is likely genuine-tradeoff scenarios (no
-clear villain, both options defensible, the difference is in how the actor
-reasons), judged on reasoning depth rather than action, not a change of POV.
+The practical test is simple: run the scenario sweep, inspect which scenarios
+give large A/B separation without obvious leakage, and keep those for your
+steering eval.

 Data: `data/persona_pairs_w2s_character.jsonl` (27 axis defs),
 `data/scenarios_w2s_character_3p.jsonl` (52 prompts).
@@ -301,5 +234,5 @@ print(results_table._appendix_block())

 ```{python}
 #| output: asis
-print(model_matrix._appendix_block(model_matrix.SUMMARY))
+print(model_matrix.appendix_block())
 ```
@@ -160,10 +160,12 @@ uv run python scripts/export_persona_template_stats.py \
  --out-prefix out/stats/v2_pilot_seed24
 ```

-Refresh the README table when the committed stats change.
+Refresh the rendered README and GitHub Pages site when the committed stats
+change.

 ```sh
-just results-table
+just readme
+just pages
 ```

 ## Accept Or Drop
@@ -0,0 +1,37 @@
+# Quick-Scroll README Panel, 2026-06-25
+
+Prompt: cold-read the README as a busy new ML researcher who wants to do
+steering, may not know this repo, and has time for a quick scroll.
+
+Five of six panel runs completed. One run was interrupted while the layout bug
+was being fixed.
+
+Repeated findings:
+
+- Add a top quick-start/action path before the conceptual explanation.
+- Caption the main plot with axes, color, and how to read a good point.
+- Explain `score t` and `judge_std` near the Results table.
+- Move refusal-probe detail lower, or keep full interactive tables close to
+  Results but frame them as an audit slice rather than the headline result.
+- Shorten or demote appendices for first-time readers.
+
+Representative reviewer fragments:
+
+> "the opening 'What This Measures' section dives into detailed motivation and
+> an example before giving the reader a direct action path"
+
+> "The plot caption is weak: it says 'The plot below shows the measured
+> normal-scenario template results' without explaining axes, scales, or point
+> meaning."
+
+> "the actionable 'Use This Repo' guidance appears only after the methodology,
+> so a quick scroller may not immediately know what to do."
+
+Edits made from the panel:
+
+- Added `Quick Start` at the top.
+- Shortened the start of `What This Measures`.
+- Replaced the weak plot lead-in with a real caption.
+- Added the `judge_std` legend next to the Results table.
+- Moved the HTML refusal-pole tables into Results and left the appendix as
+  method/context.
@@ -21,6 +21,8 @@ dependencies = [
  "nbformat>=5.10.4",
  "plotly>=6.0.0",
  "kaleido>=1.3.0",
+  "itables>=2.8.1",
+  "polars>=1.41.2",
 ]

 [tool.uv]
@@ -22,11 +22,18 @@ REFUSAL_MODEL_PAIR_STATS = [
 ]
 REFUSAL_MODEL_PREFIX = ROOT / "out/model_matrix/refusal_probe_seed24_n1"

+ANTHROPIC_IF2_COMMENT = "<!-- instruction following eval, Anthropic/if-2 -->"
+ANTHROPIC_IF2_LABEL = "Anthropic/if-2 instruction-following eval:"
+

 def read_jsonl(path: Path) -> list[dict[str, Any]]:
    return [json.loads(line) for line in path.read_text().splitlines() if line.strip()]


+def display_template_text(text: str) -> str:
+    return text.replace(ANTHROPIC_IF2_COMMENT, ANTHROPIC_IF2_LABEL)
+
+
 def clamp01(x: float) -> float:
    return max(0.0, min(1.0, x))

@@ -14,6 +14,7 @@ MAIN_SVG = docs_results.ROOT / "out/on_off_axis.svg"


 def _wrap_hover(text: str, width: int = 62) -> str:
+    text = docs_results.display_template_text(text)
    escaped = html.escape(" ".join(text.split()))
    return "<br>".join(
        textwrap.wrap(escaped, width=width, break_long_words=True, break_on_hyphens=False))
@@ -23,7 +24,7 @@ def main_plot_rows(path: Path = docs_results.NORMAL_TEMPLATE_PAIR_STATS) -> list
    return docs_results.mean_template_rows(docs_results.read_jsonl(path))


-def template_scatter(rows: list[dict[str, Any]] | None = None) -> go.Figure:
+def template_scatter(rows: list[dict[str, Any]] | None = None, width: int | None = None) -> go.Figure:
    rows = main_plot_rows() if rows is None else rows
    top_rank = {row["template"]: i for i, row in enumerate(rows[:10], start=1)}
    text = [str(top_rank[row["template"]]) if row["template"] in top_rank else "" for row in rows]
@@ -63,7 +64,7 @@ def template_scatter(rows: list[dict[str, Any]] | None = None) -> go.Figure:
    )
    fig.update_layout(
        autosize=True,
-        width=960,
+        width=width,
        height=620,
        template="plotly_white",
        margin={"l": 68, "r": 24, "t": 28, "b": 66},
@@ -91,7 +92,7 @@ def template_scatter(rows: list[dict[str, Any]] | None = None) -> go.Figure:


 def write_main_plot_assets() -> None:
-    fig = template_scatter()
+    fig = template_scatter(width=960)
    MAIN_PNG.parent.mkdir(parents=True, exist_ok=True)
    fig.write_image(MAIN_PNG, width=960, height=620, scale=2)
    fig.write_image(MAIN_SVG, width=960, height=620)
@@ -141,11 +141,7 @@ def _summarize(rows: list[dict[str, Any]], group_cols: list[str]) -> list[dict[s


 def _markdown_text(text: str) -> str:
-    if "<!-- instruction following eval, Anthropic/if-2 -->" in text:
-        text = text.replace(
-            "<!-- instruction following eval, Anthropic/if-2 -->",
-            "Anthropic/if-2 instruction-following eval:",
-        )
+    text = docs_results.display_template_text(text)
    text = text.replace("{persona}", "`{persona}`")
    text = text.replace("&", "&amp;")
    text = text.replace("<", "&lt;")
@@ -1,13 +1,18 @@
 from __future__ import annotations

+import html
 import json
+import os
 from pathlib import Path

 from tabulate import tabulate

+import docs_results

 ROOT = Path(__file__).resolve().parents[1]
 SUMMARY = ROOT / "out/model_matrix/refusal_probe_seed24_n1_template_model_summary.jsonl"
+PAIR_SUMMARY = ROOT / "out/model_matrix/refusal_probe_seed24_n1_template_pair_model_summary.jsonl"
+ANTHROPIC_IF2_SHORT_LABEL = "Anthropic IF-2: role-play as {persona}"


 def _read_jsonl(path: Path) -> list[dict]:
@@ -15,11 +20,7 @@ def _read_jsonl(path: Path) -> list[dict]:


 def _markdown_text(text: str) -> str:
-    if "<!-- instruction following eval, Anthropic/if-2 -->" in text:
-        text = text.replace(
-            "<!-- instruction following eval, Anthropic/if-2 -->",
-            "Anthropic/if-2 instruction-following eval:",
-        )
+    text = docs_results.display_template_text(text)
    text = text.replace("{persona}", "`{persona}`")
    text = text.replace("&", "&amp;")
    text = text.replace("<", "&lt;")
@@ -42,8 +43,7 @@ def _appendix_table(rows: list[dict]) -> str:
    return tabulate(table_rows, headers="keys", tablefmt="github", disable_numparse=True)


-def _appendix_block(summary_path: Path) -> str:
-    rows = _read_jsonl(summary_path)
+def _appendix_intro() -> str:
    return "\n\n".join([
        "## Appendix: Refusal-Pole Probe",
        (
@@ -67,12 +67,161 @@ def _appendix_block(summary_path: Path) -> str:
            "[out/model_matrix/refusal_probe_seed24_n1_model_matrix_summary.md]"
            "(out/model_matrix/refusal_probe_seed24_n1_model_matrix_summary.md)."
        ),
+    ])
+
+
+def _appendix_block(summary_path: Path) -> str:
+    rows = _read_jsonl(summary_path)
+    return "\n\n".join([
+        _appendix_intro(),
        _appendix_table(rows),
    ])


+def _template_display_text(text: str) -> str:
+    text = docs_results.display_template_text(text)
+    text = " ".join(text.split())
+    if (
+        text
+        == "Anthropic/if-2 instruction-following eval: Assist the user by role playing as {persona}"
+    ):
+        return ANTHROPIC_IF2_SHORT_LABEL
+    return text.replace("{persona}", "{persona}")
+
+
+def _table_styles() -> str:
+    return """
+<style>
+.refusal-table-wrap {
+  margin: 1rem 0 2rem;
+}
+.refusal-table-wrap table.dataTable {
+  width: 100% !important;
+}
+.refusal-table-wrap table.dataTable td,
+.refusal-table-wrap table.dataTable th {
+  vertical-align: top;
+}
+.refusal-table-wrap table.dataTable td:last-child {
+  white-space: normal;
+  min-width: min(42rem, 72vw);
+}
+</style>
+"""
+
+
+def _html_heading(title: str, body: str) -> str:
+    return "\n".join([
+        f"<h3>{html.escape(title)}</h3>",
+        f"<p>{html.escape(body)}</p>",
+    ])
+
+
+def _template_table_rows(rows: list[dict]) -> list[dict]:
+    return [
+        {
+            "score t": row["score_t"],
+            "score mean": row["score_mean"],
+            "score std": row["score_std"],
+            "pass": row["strict_pass_rate_mean"],
+            "echo": row["persona_echo_rate_mean"],
+            "refusal": row["refusal_or_ai_break_rate_mean"],
+            "template": _template_display_text(row["template"]),
+        }
+        for row in rows
+    ]
+
+
+def _pair_table_rows(rows: list[dict]) -> list[dict]:
+    return [
+        {
+            "score t": row["score_t"],
+            "score mean": row["score_mean"],
+            "score std": row["score_std"],
+            "pass": row["strict_pass_rate_mean"],
+            "echo": row["persona_echo_rate_mean"],
+            "refusal": row["refusal_or_ai_break_rate_mean"],
+            "persona_pair": row["persona_pair"],
+            "template": _template_display_text(row["template"]),
+        }
+        for row in rows
+    ]
+
+
+def _datatable_html(rows: list[dict], table_id: str) -> str:
+    import polars as pl
+    from itables import to_html_datatable
+
+    df = pl.DataFrame(rows)
+    return "\n".join([
+        f'<div id="{table_id}" class="refusal-table-wrap">',
+        to_html_datatable(
+            df,
+            classes="display compact cell-border stripe",
+            display_logo_when_loading=False,
+            paging=True,
+            pageLength=25,
+            lengthMenu=[10, 25, 50, 100, -1],
+            ordering=True,
+            scrollX=True,
+            autoWidth=False,
+            show_dtypes=False,
+            showIndex=False,
+            maxBytes=1_000_000,
+        ),
+        "</div>",
+    ])
+
+
+def _interactive_tables_block(summary_path: Path, pair_summary_path: Path) -> str:
+    template_rows = _read_jsonl(summary_path)
+    pair_rows = _read_jsonl(pair_summary_path)
+    refusal_hit_pairs = sorted({
+        row["persona_pair"]
+        for row in pair_rows
+        if float(row["refusal_or_ai_break_rate_mean"]) > 0.0
+    })
+    refusal_pair_rows = [
+        row for row in pair_rows
+        if row["persona_pair"] in refusal_hit_pairs
+    ]
+
+    return "\n\n".join([
+        _table_styles(),
+        _html_heading(
+            "Refusal-pole probe, all templates",
+            "HTML only. Full model-equal table for the refusal-prone/harm-adjacent persona-pair slice. Sort by score t, refusal, echo, or pass; search for a template phrase.",
+        ),
+        _datatable_html(_template_table_rows(template_rows), "refusal-template-table"),
+        _html_heading(
+            "Persona pairs with refusal audit hits, all templates retained",
+            (
+                "This filters persona pairs to those with any refusal-or-AI-break audit hit, "
+                f"then keeps every template for those pairs. Current pairs: {', '.join(refusal_hit_pairs)}."
+            ),
+        ),
+        _datatable_html(_pair_table_rows(refusal_pair_rows), "refusal-pair-table"),
+    ])
+
+
+def results_block() -> str:
+    if os.environ["PSTL_DOC_TARGET"] == "html":
+        return _interactive_tables_block(SUMMARY, PAIR_SUMMARY)
+    return "\n".join([
+        "Full refusal-pole audit table: "
+        "[out/model_matrix/refusal_probe_seed24_n1_model_matrix_summary.md]"
+        "(out/model_matrix/refusal_probe_seed24_n1_model_matrix_summary.md)."
+    ])
+
+
+def appendix_block() -> str:
+    if os.environ["PSTL_DOC_TARGET"] == "html":
+        return _appendix_intro()
+    return _appendix_block(SUMMARY)
+
+
 def main() -> None:
-    print(_appendix_block(SUMMARY))
+    print(appendix_block())


 if __name__ == "__main__":
@@ -26,11 +26,7 @@ def _score(row: dict) -> float:
 def _markdown_text(text: str) -> str:
    if text == "__verbatim_skill_persona__":
        text = ENGINEERED_DISPLAY
-    if "<!-- instruction following eval, Anthropic/if-2 -->" in text:
-        text = text.replace(
-            "<!-- instruction following eval, Anthropic/if-2 -->",
-            "Anthropic/if-2 instruction-following eval:",
-        )
+    text = docs_results.display_template_text(text)
    if text == "":
        return "`<blank>`"
    text = text.replace("{{ persona }}", "{persona}")
@@ -105,7 +101,8 @@ def _results_block() -> str:
        (
            "Seed-24 pilot. Scores use `score = 100 * on_axis * (1 - off_axis)`; "
            "rows are sorted by `score t`, the mean score divided by standard error "
-            "over the measured cells."
+            "over the measured cells. `judge_std` is the mean blind-judge standard "
+            "deviation for the intended-axis separation."
        ),
        "Top scored methods:",
        _table(top_rows),
@@ -7,7 +7,7 @@ resolution-markers = [
 ]

 [options]
-exclude-newer = "2026-06-19T04:58:30.171108401Z"
+exclude-newer = "2026-06-19T05:19:42.060161704Z"
 exclude-newer-span = "P6D"

 [[package]]
@@ -583,6 +583,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/d9/33/1f075bf72b0b747cb3288d011319aaf64083cf2efef8354174e3ed4540e2/ipython_pygments_lexers-1.1.1-py3-none-any.whl", hash = "sha256:a9462224a505ade19a605f71f8fa63c2048833ce50abc86768a0d81d876dc81c", size = 8074, upload-time = "2025-01-17T11:24:33.271Z" },
 ]

+[[package]]
+name = "itables"
+version = "2.8.1"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/b4/0d/e4a935862ee77e06062c6b797357c7aaf9d4ba9a32d6eb129018d0d19be4/itables-2.8.1.tar.gz", hash = "sha256:562c7d716d667f3faf87ffe1044a19747a3b231ee6aa7725eb6f908caa18c429", size = 1526821, upload-time = "2026-06-10T22:28:07.66Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ad/22/eb6ae7468ba673fcb891ff3142e13ffa18f6a43183e6dd8f224b2b4321d3/itables-2.8.1-py3-none-any.whl", hash = "sha256:262e3908771af90634546fe4a5ed63e0d442a6957efbcdcd2ae5cad4845b76e3", size = 1551238, upload-time = "2026-06-10T22:28:05.09Z" },
+]
+
 [[package]]
 name = "jedi"
 version = "0.20.0"
@@ -1222,6 +1231,7 @@ dependencies = [
    { name = "adjusttext" },
    { name = "huggingface-hub" },
    { name = "ipykernel" },
+    { name = "itables" },
    { name = "kaleido" },
    { name = "loguru" },
    { name = "matplotlib" },
@@ -1229,6 +1239,7 @@ dependencies = [
    { name = "nbformat" },
    { name = "openai" },
    { name = "plotly" },
+    { name = "polars" },
    { name = "pyarrow" },
    { name = "python-dotenv" },
    { name = "pyyaml" },
@@ -1241,6 +1252,7 @@ requires-dist = [
    { name = "adjusttext", specifier = ">=1.3.0" },
    { name = "huggingface-hub", specifier = ">=1.18.0" },
    { name = "ipykernel", specifier = ">=7.3.0" },
+    { name = "itables", specifier = ">=2.8.1" },
    { name = "kaleido", specifier = ">=1.3.0" },
    { name = "loguru" },
    { name = "matplotlib", specifier = ">=3.10.0" },
@@ -1248,6 +1260,7 @@ requires-dist = [
    { name = "nbformat", specifier = ">=5.10.4" },
    { name = "openai" },
    { name = "plotly", specifier = ">=6.0.0" },
+    { name = "polars", specifier = ">=1.41.2" },
    { name = "pyarrow", specifier = ">=24.0.0" },
    { name = "python-dotenv" },
    { name = "pyyaml" },
@@ -1376,6 +1389,34 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/f9/14/abe5ce876ab5b66ee3c691bf537fcd43d037aea55d447aacf74630a8f31e/plotly-6.8.0-py3-none-any.whl", hash = "sha256:13c5c4a0f70b74cab1913eda0de49b826df5931708eb6f9c3010040614700ec8", size = 9902055, upload-time = "2026-06-03T18:33:34.26Z" },
 ]

+[[package]]
+name = "polars"
+version = "1.41.2"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "polars-runtime-32" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/ff/f9/aeda46259b0669247a160315d2d51269de9504b9dd2f70acadbcb22f46b7/polars-1.41.2.tar.gz", hash = "sha256:256d6731162371b77f3f29a55eacb8c0fc740ddb1a293a01d2ef5b5393c5c708", size = 737996, upload-time = "2026-05-29T17:39:15.604Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/1f/22/28f62d24f7db56ac4343588f9362d49b7b4177e55ac47a466fe696b0099b/polars-1.41.2-py3-none-any.whl", hash = "sha256:23ce9a2910b6e3e8d4258770bf44aa17170958df7af6e85feedf4458a04d8d29", size = 833445, upload-time = "2026-05-29T17:37:05.576Z" },
+]
+
+[[package]]
+name = "polars-runtime-32"
+version = "1.41.2"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/f9/56/54e3ea0e9b64f327179049e4742241cc6b1d3e8fa414b05a057dd26df367/polars_runtime_32-1.41.2.tar.gz", hash = "sha256:7af09ec1ab053da2c9669e8d15f809a4083a29be05db57111688b8051062af56", size = 2989474, upload-time = "2026-05-29T17:39:17.257Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/d6/9b/fe72a3811c0357cdb06c67bdc7695fa1623ad47948fc523195f5ac31037f/polars_runtime_32-1.41.2-cp310-abi3-macosx_10_12_x86_64.whl", hash = "sha256:95a08346dac337357cdb825c8076df7d36da54c4caa59a5cb41d0a30691c5edd", size = 52265283, upload-time = "2026-05-29T17:37:09.407Z" },
+    { url = "https://files.pythonhosted.org/packages/0a/93/fab9da803fd80d9e83ef88c20932f637a10bc611b20415fc322eec84bc44/polars_runtime_32-1.41.2-cp310-abi3-macosx_11_0_arm64.whl", hash = "sha256:dedfaeec2c7f995298da7319dd9431d662e5dd1d0ec51b1459df4a0234ceff52", size = 46571222, upload-time = "2026-05-29T17:37:13.698Z" },
+    { url = "https://files.pythonhosted.org/packages/c8/2a/8843f34a8ac57acd058a39b87b03b580dd352a490e9dae0415e02033bdd4/polars_runtime_32-1.41.2-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:18eea22c5cc34e27f8a60950458ad81e6a9ea75e89363ca1367e14e7e7f781fc", size = 50409372, upload-time = "2026-05-29T17:37:17.875Z" },
+    { url = "https://files.pythonhosted.org/packages/6c/c6/92b352fe88cf51bd0a19fb99e1c0cbe46aa26c14dcf7995b89869cd932ae/polars_runtime_32-1.41.2-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:2630540dfdfb0f36f9b04a07c7c2e3f50bf2ad384113263c1c812007ee9141e0", size = 56405484, upload-time = "2026-05-29T17:37:22.684Z" },
+    { url = "https://files.pythonhosted.org/packages/74/c4/bae3174c3b02f6b441d2e58594387abcd509f67a098f682a83b195f08966/polars_runtime_32-1.41.2-cp310-abi3-musllinux_1_2_aarch64.whl", hash = "sha256:20e969e08f9b137e233c04cc04de73d9795f89eb77d34854e40a025965a43763", size = 50603512, upload-time = "2026-05-29T17:37:27.422Z" },
+    { url = "https://files.pythonhosted.org/packages/f4/ed/f2d26ae02d92c2689056838ed59e2a626326ad23c2831d58637d25f6c82a/polars_runtime_32-1.41.2-cp310-abi3-musllinux_1_2_x86_64.whl", hash = "sha256:e7016a3deb641b64a31447abbbee0f34bd020a6a9ae34ee6b743837def15e2a4", size = 54328561, upload-time = "2026-05-29T17:37:32.587Z" },
+    { url = "https://files.pythonhosted.org/packages/9b/c4/9c3831cc885dc7769e59abf8f583821a5fb4403fd0e4eba0ccc6d47a3d4b/polars_runtime_32-1.41.2-cp310-abi3-win_amd64.whl", hash = "sha256:1e5e5377c315e0dcafdfb2a31adc546abbaeb3f9cb1864e6536523d2af473265", size = 51978643, upload-time = "2026-05-29T17:37:37.443Z" },
+    { url = "https://files.pythonhosted.org/packages/cd/c6/79e9f3f270270d7ed5575d92b7bfef49f01abd9275447161275b23b553a8/polars_runtime_32-1.41.2-cp310-abi3-win_arm64.whl", hash = "sha256:843d96f69d18eca53429c1198e58891db7f18111f83b9c419bb45ad9d73eaed5", size = 46006901, upload-time = "2026-05-29T17:37:42.522Z" },
+]
+
 [[package]]
 name = "prompt-toolkit"
 version = "3.0.52"
Author	SHA1	Message	Date
wassname	9e73d9fa46	docs: align persona-template skill workflow	2026-06-25 14:08:19 +08:00
wassname	8b99b2dca0	docs: shorten scenario suffix appendix	2026-06-25 13:56:35 +08:00
wassname	cd695c411b	docs: improve quick-scroll README	2026-06-25 13:36:00 +08:00
wassname	8162aa1ee9	docs: widen Quarto HTML layout	2026-06-25 13:27:21 +08:00
wassname	afbfbf514f	docs: add interactive refusal tables	2026-06-25 13:23:34 +08:00