docs: improve quick-scroll README

2026-06-27 16:46:08 +08:00 · 2026-06-25 13:36:00 +08:00
parent 8162aa1ee9
commit cd695c411b
8 changed files with 118 additions and 88 deletions
@@ -4,7 +4,8 @@ format:
  gfm: default
  html:
    toc: true
-    css: docs/quarto-wide.css
+    theme: default
+    max-width: 100%
 from: markdown-smart
 jupyter: python3
 execute:
@@ -26,10 +27,24 @@ ROOT = Path.cwd()
 sys.path.insert(0, str(ROOT / "scripts"))
 ```

+## Quick Start
+
+Use this repo to choose the prompt parts for persona steering:
+
+| choice | use |
+|---|---|
+| persona templates | Start with the top Results table, the Hugging Face `main` split, or [`data/template_catalog.yaml`](data/template_catalog.yaml). |
+| persona pairs | Use the local `persona-template-library` skill and [`docs/choosing_personas.md`](docs/choosing_personas.md) to write mirrored positive/negative poles. |
+| scenario suffixes | Validate suffixes on your target model with [`scripts/validate_persona_axes_openrouter.py`](scripts/validate_persona_axes_openrouter.py). |
+
+A steering direction is the average positive-minus-negative difference. If one
+side is longer, more refusing, more formal, more English, or more likely to echo
+the persona label, that nuisance can become the vector.
+
 ## What This Measures

-How do we know if a persona template is good? What's the best one for steering?
-And how can we measure it?
+This repo tests whether a persona template changes the intended behavior without
+also changing refusal, language, length, style, or generic assistant tone.

 The catalog has ~100 reusable templates. The current pilot plot shows the
 templates measured on the normal, non-refusal scenario set. We want on-axis
@@ -70,24 +85,12 @@ make it accessible to more people and agents.

 Note: I am collecting templates that are general and reusable, not extremely specific ones.

-## Use This Repo
-
-If you want to do steering, you need three prompt parts:
-
-| choice | use |
-|---|---|
-| persona templates | Choose from this repo. Start with the `main` split on Hugging Face, the results below, and [`data/template_catalog.yaml`](data/template_catalog.yaml). |
-| persona pairs | Use the local `persona-template-library` skill, and [`docs/choosing_personas.md`](docs/choosing_personas.md), to write mirrored positive/negative poles. |
-| scenario suffixes | Validate them on your target model. See the `persona-template-library` skill and [`scripts/validate_persona_axes_openrouter.py`](scripts/validate_persona_axes_openrouter.py). |
-
-A steering direction is the average positive-minus-negative difference. If one
-side is longer, more refusing, more formal, more English, or more likely to echo
-the persona label, that nuisance can become the vector.
-
 ## Results

-The plot below shows the measured normal-scenario template results. The full
-template inventory is [`data/template_catalog.yaml`](data/template_catalog.yaml).
+Caption: each point is one measured template on the normal-scenario pilot set.
+Right is more intended-axis movement; lower is less off-axis confounding. Color
+is `score t`, the score mean divided by standard error. The full template
+inventory is [`data/template_catalog.yaml`](data/template_catalog.yaml).

 ```{python}
 from IPython.display import Markdown, display
@@ -112,12 +115,12 @@ print(results_table._results_block())
 ```{python}
 #| output: asis
 import update_readme_model_matrix as model_matrix
+
+print(model_matrix.results_block())
 ```

-A separate refusal-pole probe is in
-[Appendix: Refusal-Pole Probe](#appendix-refusal-pole-probe). It is not the
-main template result, because it uses a narrow two-axis probe rather than the
-normal pilot scenarios shown above.
+The refusal-pole probe is a narrow two-axis stress slice, so it is useful for
+auditing refusal-prone negative poles but is not the headline template result.

 ## Method