From 51b67ac99ca66066f47da3da2c0cc58bbf4f15af Mon Sep 17 00:00:00 2001
From: "wassname (Michael J Clark)" <1103714+wassname@users.noreply.github.com>
Date: Sat, 13 Jun 2026 15:04:26 +0800
Subject: [PATCH] Update README.md

---
 README.md | 40 ++++++++++++++++------------------------
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/README.md b/README.md
index 6526cd6..b8cfbc8 100644
--- a/README.md
+++ b/README.md
@@ -25,8 +25,8 @@ This field is pre-scientific in a way: it is still an art. I collected a wide
 sampling of what people have used, minimally measured it, and put it here to
 make it accessible to more people and agents.
 
-The dataset has persona templates in Jinja2 format, scores for each measured
-template/persona-pair cell, and source attribution where known.
+<!-- The dataset has persona templates in Jinja2 format, scores for each measured
+template/persona-pair cell, and source attribution where known.  TODO this shoudl become redundnat -->
 
 ## Use
 
@@ -35,7 +35,7 @@ first: one row per measured template/persona-pair cell.
 
 Important columns:
 
-- `template_jinja`
+- `template_jinja`: TODO Example for each, description of each
 - `score`
 - `positive_persona`
 - `negative_persona`
@@ -57,6 +57,17 @@ kept as audit columns rather than folded into the headline score.
 
 ## Confounds Audited
 
+> My intuition is that many of these are RLHF-ish side effects: helpfulness,
+harmless refusals, honesty tone, sycophancy, polished vagueness, and generic
+assistant style can be large, easy-to-trigger axes that show up instead of the
+thing you meant. - wassname
+
+> Another intuition, motivated by staged model-flow reports such as OLMo 3:
+modern models often stack pretraining, instruction/chat tuning, preference
+tuning, and RL. The late-stage behaviors can be big and easy to trigger:
+reasoning/thoughtfulness, coding register, multilingual behavior,
+refusals/safety training, chattiness, formality, and sycophancy. - wassname
+
 The judge audits length, generic helpfulness, harmlessness/refusal,
 honesty/truthfulness, thoughtfulness/reasoning depth, task-context shift
 (code/chat/math/think), coding style, multilingual behavior, confidence,
@@ -70,25 +81,13 @@ style, multilinguality, verbosity, chattiness, confidence, hedging, vagueness,
 warmth, enthusiasm, praise, sycophancy, directness, formality, language shift,
 and incoherence.
 
-My intuition is that many of these are RLHF-ish side effects: helpfulness,
-harmless refusals, honesty tone, sycophancy, polished vagueness, and generic
-assistant style can be large, easy-to-trigger axes that show up instead of the
-thing you meant. - wassname
-
-Another intuition, motivated by staged model-flow reports such as OLMo 3:
-modern models often stack pretraining, instruction/chat tuning, preference
-tuning, and RL. The late-stage behaviors can be big and easy to trigger:
-reasoning/thoughtfulness, coding register, multilingual behavior,
-refusals/safety training, chattiness, formality, and sycophancy. - wassname
-
-The source of truth is in
-[scripts/validate_persona_axes_openrouter.py](scripts/validate_persona_axes_openrouter.py#L474).
+Code [scripts/validate_persona_axes_openrouter.py](scripts/validate_persona_axes_openrouter.py#L474).
 
 ## Provenance
 
 Sources are marked in the dataset as `source`, `source_type`, and `source_url`.
 Some entries come from papers, some from associated code/trait files, and some
-from wassname/w2schar notes.
+from wassname project notes.
 
 ## Acknowledgements
 
@@ -129,11 +128,4 @@ uv run python scripts/build_hf_dataset.py \
   year = {2026},
   url = {https://github.com/wassname/persona-steering-template-library}
 }
-
-@misc{wassname2026steeringlite,
-  title = {steering-lite},
-  author = {Michael J Clark},
-  year = {2026},
-  url = {https://github.com/wassname/steering-lite}
-}
 ```