docs: improve quick-scroll README

2026-06-27 15:16:06 +08:00 · 2026-06-25 13:36:00 +08:00
parent 8162aa1ee9
commit cd695c411b
8 changed files with 118 additions and 88 deletions
@@ -7,10 +7,25 @@ preference-pair experiments.
 Dataset:
 [wassname/persona-steering-template-library](https://huggingface.co/datasets/wassname/persona-steering-template-library)

+## Quick Start
+
+Use this repo to choose the prompt parts for persona steering:
+
+| choice | use |
+|----|----|
+| persona templates | Start with the top Results table, the Hugging Face `main` split, or [`data/template_catalog.yaml`](data/template_catalog.yaml). |
+| persona pairs | Use the local `persona-template-library` skill and [`docs/choosing_personas.md`](docs/choosing_personas.md) to write mirrored positive/negative poles. |
+| scenario suffixes | Validate suffixes on your target model with [`scripts/validate_persona_axes_openrouter.py`](scripts/validate_persona_axes_openrouter.py). |
+
+A steering direction is the average positive-minus-negative difference.
+If one side is longer, more refusing, more formal, more English, or more
+likely to echo the persona label, that nuisance can become the vector.
+
 ## What This Measures

-How do we know if a persona template is good? What's the best one for
-steering? And how can we measure it?
+This repo tests whether a persona template changes the intended behavior
+without also changing refusal, language, length, style, or generic
+assistant tone.

 The catalog has ~100 reusable templates. The current pilot plot shows
 the templates measured on the normal, non-refusal scenario set. We want
@@ -55,24 +70,12 @@ make it accessible to more people and agents.
 Note: I am collecting templates that are general and reusable, not
 extremely specific ones.

-## Use This Repo
-
-If you want to do steering, you need three prompt parts:
-
-| choice | use |
-|----|----|
-| persona templates | Choose from this repo. Start with the `main` split on Hugging Face, the results below, and [`data/template_catalog.yaml`](data/template_catalog.yaml). |
-| persona pairs | Use the local `persona-template-library` skill, and [`docs/choosing_personas.md`](docs/choosing_personas.md), to write mirrored positive/negative poles. |
-| scenario suffixes | Validate them on your target model. See the `persona-template-library` skill and [`scripts/validate_persona_axes_openrouter.py`](scripts/validate_persona_axes_openrouter.py). |
-
-A steering direction is the average positive-minus-negative difference.
-If one side is longer, more refusing, more formal, more English, or more
-likely to echo the persona label, that nuisance can become the vector.
-
 ## Results

-The plot below shows the measured normal-scenario template results. The
-full template inventory is
+Caption: each point is one measured template on the normal-scenario
+pilot set. Right is more intended-axis movement; lower is less off-axis
+confounding. Color is `score t`, the score mean divided by standard
+error. The full template inventory is
 [`data/template_catalog.yaml`](data/template_catalog.yaml).

 ![plot](./out/on_off_axis.png)
@@ -81,7 +84,8 @@ full template inventory is

 Seed-24 pilot. Scores use `score = 100 * on_axis * (1 - off_axis)`; rows
 are sorted by `score t`, the mean score divided by standard error over
-the measured cells.
+the measured cells. `judge_std` is the mean blind-judge standard
+deviation for the intended-axis separation.

 Top scored methods:

@@ -101,10 +105,12 @@ Top scored methods:
 - Not a persona, this is a baseline measurement, AxBench style where an
  AI model generates a long custom persona.

-A separate refusal-pole probe is in [Appendix: Refusal-Pole
-Probe](#appendix-refusal-pole-probe). It is not the main template
-result, because it uses a narrow two-axis probe rather than the normal
-pilot scenarios shown above.
+Full refusal-pole audit table:
+[out/model_matrix/refusal_probe_seed24_n1_model_matrix_summary.md](out/model_matrix/refusal_probe_seed24_n1_model_matrix_summary.md).
+
+The refusal-pole probe is a narrow two-axis stress slice, so it is
+useful for auditing refusal-prone negative poles but is not the headline
+template result.

 ## Method

@@ -4,7 +4,8 @@ format:
  gfm: default
  html:
    toc: true
-    css: docs/quarto-wide.css
+    theme: default
+    max-width: 100%
 from: markdown-smart
 jupyter: python3
 execute:
@@ -26,10 +27,24 @@ ROOT = Path.cwd()
 sys.path.insert(0, str(ROOT / "scripts"))
 ```

+## Quick Start
+
+Use this repo to choose the prompt parts for persona steering:
+
+| choice | use |
+|---|---|
+| persona templates | Start with the top Results table, the Hugging Face `main` split, or [`data/template_catalog.yaml`](data/template_catalog.yaml). |
+| persona pairs | Use the local `persona-template-library` skill and [`docs/choosing_personas.md`](docs/choosing_personas.md) to write mirrored positive/negative poles. |
+| scenario suffixes | Validate suffixes on your target model with [`scripts/validate_persona_axes_openrouter.py`](scripts/validate_persona_axes_openrouter.py). |
+
+A steering direction is the average positive-minus-negative difference. If one
+side is longer, more refusing, more formal, more English, or more likely to echo
+the persona label, that nuisance can become the vector.
+
 ## What This Measures

-How do we know if a persona template is good? What's the best one for steering?
-And how can we measure it?
+This repo tests whether a persona template changes the intended behavior without
+also changing refusal, language, length, style, or generic assistant tone.

 The catalog has ~100 reusable templates. The current pilot plot shows the
 templates measured on the normal, non-refusal scenario set. We want on-axis
@@ -70,24 +85,12 @@ make it accessible to more people and agents.

 Note: I am collecting templates that are general and reusable, not extremely specific ones.

-## Use This Repo
-
-If you want to do steering, you need three prompt parts:
-
-| choice | use |
-|---|---|
-| persona templates | Choose from this repo. Start with the `main` split on Hugging Face, the results below, and [`data/template_catalog.yaml`](data/template_catalog.yaml). |
-| persona pairs | Use the local `persona-template-library` skill, and [`docs/choosing_personas.md`](docs/choosing_personas.md), to write mirrored positive/negative poles. |
-| scenario suffixes | Validate them on your target model. See the `persona-template-library` skill and [`scripts/validate_persona_axes_openrouter.py`](scripts/validate_persona_axes_openrouter.py). |
-
-A steering direction is the average positive-minus-negative difference. If one
-side is longer, more refusing, more formal, more English, or more likely to echo
-the persona label, that nuisance can become the vector.
-
 ## Results

-The plot below shows the measured normal-scenario template results. The full
-template inventory is [`data/template_catalog.yaml`](data/template_catalog.yaml).
+Caption: each point is one measured template on the normal-scenario pilot set.
+Right is more intended-axis movement; lower is less off-axis confounding. Color
+is `score t`, the score mean divided by standard error. The full template
+inventory is [`data/template_catalog.yaml`](data/template_catalog.yaml).

 ```{python}
 from IPython.display import Markdown, display
@@ -112,12 +115,12 @@ print(results_table._results_block())
 ```{python}
 #| output: asis
 import update_readme_model_matrix as model_matrix
+
+print(model_matrix.results_block())
 ```

-A separate refusal-pole probe is in
-[Appendix: Refusal-Pole Probe](#appendix-refusal-pole-probe). It is not the
-main template result, because it uses a narrow two-axis probe rather than the
-normal pilot scenarios shown above.
+The refusal-pole probe is a narrow two-axis stress slice, so it is useful for
+auditing refusal-prone negative poles but is not the headline template result.

 ## Method

@@ -1,32 +0,0 @@
-.page-columns {
-  display: block;
-}
-
-.page-columns .content {
-  max-width: calc(100vw - 48px);
-  margin-left: auto;
-  margin-right: auto;
-}
-
-main.content {
-  width: calc(100vw - 48px);
-}
-
-.cell-output-display,
-.js-plotly-plot,
-.plotly-graph-div,
-.refusal-table-wrap {
-  max-width: 100%;
-}
-
-.cell-output-display > div[style*="width:960px"] {
-  width: 100% !important;
-}
-
-@media (max-width: 720px) {
-  .page-columns .content,
-  main.content {
-    width: calc(100vw - 24px);
-    max-width: calc(100vw - 24px);
-  }
-}
@@ -0,0 +1,37 @@
+# Quick-Scroll README Panel, 2026-06-25
+
+Prompt: cold-read the README as a busy new ML researcher who wants to do
+steering, may not know this repo, and has time for a quick scroll.
+
+Five of six panel runs completed. One run was interrupted while the layout bug
+was being fixed.
+
+Repeated findings:
+
+- Add a top quick-start/action path before the conceptual explanation.
+- Caption the main plot with axes, color, and how to read a good point.
+- Explain `score t` and `judge_std` near the Results table.
+- Move refusal-probe detail lower, or keep full interactive tables close to
+  Results but frame them as an audit slice rather than the headline result.
+- Shorten or demote appendices for first-time readers.
+
+Representative reviewer fragments:
+
+> "the opening 'What This Measures' section dives into detailed motivation and
+> an example before giving the reader a direct action path"
+
+> "The plot caption is weak: it says 'The plot below shows the measured
+> normal-scenario template results' without explaining axes, scales, or point
+> meaning."
+
+> "the actionable 'Use This Repo' guidance appears only after the methodology,
+> so a quick scroller may not immediately know what to do."
+
+Edits made from the panel:
+
+- Added `Quick Start` at the top.
+- Shortened the start of `What This Measures`.
+- Replaced the weak plot lead-in with a real caption.
+- Added the `judge_std` legend next to the Results table.
+- Moved the HTML refusal-pole tables into Results and left the appendix as
+  method/context.
@@ -24,7 +24,7 @@ def main_plot_rows(path: Path = docs_results.NORMAL_TEMPLATE_PAIR_STATS) -> list
    return docs_results.mean_template_rows(docs_results.read_jsonl(path))


-def template_scatter(rows: list[dict[str, Any]] | None = None) -> go.Figure:
+def template_scatter(rows: list[dict[str, Any]] | None = None, width: int | None = None) -> go.Figure:
    rows = main_plot_rows() if rows is None else rows
    top_rank = {row["template"]: i for i, row in enumerate(rows[:10], start=1)}
    text = [str(top_rank[row["template"]]) if row["template"] in top_rank else "" for row in rows]
@@ -64,7 +64,7 @@ def template_scatter(rows: list[dict[str, Any]] | None = None) -> go.Figure:
    )
    fig.update_layout(
        autosize=True,
-        width=960,
+        width=width,
        height=620,
        template="plotly_white",
        margin={"l": 68, "r": 24, "t": 28, "b": 66},
@@ -92,7 +92,7 @@ def template_scatter(rows: list[dict[str, Any]] | None = None) -> go.Figure:


 def write_main_plot_assets() -> None:
-    fig = template_scatter()
+    fig = template_scatter(width=960)
    MAIN_PNG.parent.mkdir(parents=True, exist_ok=True)
    fig.write_image(MAIN_PNG, width=960, height=620, scale=2)
    fig.write_image(MAIN_SVG, width=960, height=620)
@@ -12,6 +12,7 @@ import docs_results
 ROOT = Path(__file__).resolve().parents[1]
 SUMMARY = ROOT / "out/model_matrix/refusal_probe_seed24_n1_template_model_summary.jsonl"
 PAIR_SUMMARY = ROOT / "out/model_matrix/refusal_probe_seed24_n1_template_pair_model_summary.jsonl"
+ANTHROPIC_IF2_SHORT_LABEL = "Anthropic IF-2: role-play as {persona}"


 def _read_jsonl(path: Path) -> list[dict]:
@@ -80,6 +81,11 @@ def _appendix_block(summary_path: Path) -> str:
 def _template_display_text(text: str) -> str:
    text = docs_results.display_template_text(text)
    text = " ".join(text.split())
+    if (
+        text
+        == "Anthropic/if-2 instruction-following eval: Assist the user by role playing as {persona}"
+    ):
+        return ANTHROPIC_IF2_SHORT_LABEL
    return text.replace("{persona}", "{persona}")


@@ -167,7 +173,7 @@ def _datatable_html(rows: list[dict], table_id: str) -> str:
    ])


-def _interactive_appendix_block(summary_path: Path, pair_summary_path: Path) -> str:
+def _interactive_tables_block(summary_path: Path, pair_summary_path: Path) -> str:
    template_rows = _read_jsonl(summary_path)
    pair_rows = _read_jsonl(pair_summary_path)
    refusal_hit_pairs = sorted({
@@ -181,11 +187,10 @@ def _interactive_appendix_block(summary_path: Path, pair_summary_path: Path) ->
    ]

    return "\n\n".join([
-        _appendix_intro(),
        _table_styles(),
        _html_heading(
-            "All refusal-pole templates",
-            "Full model-equal template table. Sort by score t, refusal, echo, or pass; search for a template phrase.",
+            "Refusal-pole probe, all templates",
+            "HTML only. Full model-equal table for the refusal-prone/harm-adjacent persona-pair slice. Sort by score t, refusal, echo, or pass; search for a template phrase.",
        ),
        _datatable_html(_template_table_rows(template_rows), "refusal-template-table"),
        _html_heading(
@@ -199,9 +204,19 @@ def _interactive_appendix_block(summary_path: Path, pair_summary_path: Path) ->
    ])


+def results_block() -> str:
+    if os.environ["PSTL_DOC_TARGET"] == "html":
+        return _interactive_tables_block(SUMMARY, PAIR_SUMMARY)
+    return "\n".join([
+        "Full refusal-pole audit table: "
+        "[out/model_matrix/refusal_probe_seed24_n1_model_matrix_summary.md]"
+        "(out/model_matrix/refusal_probe_seed24_n1_model_matrix_summary.md)."
+    ])
+
+
 def appendix_block() -> str:
    if os.environ["PSTL_DOC_TARGET"] == "html":
-        return _interactive_appendix_block(SUMMARY, PAIR_SUMMARY)
+        return _appendix_intro()
    return _appendix_block(SUMMARY)


@@ -101,7 +101,8 @@ def _results_block() -> str:
        (
            "Seed-24 pilot. Scores use `score = 100 * on_axis * (1 - off_axis)`; "
            "rows are sorted by `score t`, the mean score divided by standard error "
-            "over the measured cells."
+            "over the measured cells. `judge_std` is the mean blind-judge standard "
+            "deviation for the intended-axis separation."
        ),
        "Top scored methods:",
        _table(top_rows),