From 7d703c0cc3705a85f44122d1cafa4be96a0b0929 Mon Sep 17 00:00:00 2001
From: "wassname (Michael J Clark)" <1103714+wassname@users.noreply.github.com>
Date: Wed, 10 Jun 2026 16:25:47 +0800
Subject: [PATCH] Enhance experiment spec with hypothesis and steps

Added hypothesis and steps for experiment spec on steering vectors and LoRA weights.
---
 README.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/README.md b/README.md
index 1cab440..0211ac4 100644
--- a/README.md
+++ b/README.md
@@ -119,6 +119,8 @@ Here is my diary, I have one dream journal and one breakfast entry per day.
 
 ## Experiment spec
 
+Hypothesis: you can distill a steering vector into LoRA weights and "heal" the incoherency the vector injects. Hopefully lora nll+kl training does this. Then loop and see what multiple rounds give you.
+
 1. Pick a contrastive persona pair on one trait axis, e.g. `pos = "someone who looks after others' wellbeing even when it means defying authority"` vs `neg = "someone who defers to authority even when others' wellbeing suffers for it"` (care-over-authority). The vector is `pos - neg`, so it isolates the axis, not "being a persona".
 2. Build the steering vector as the mean hidden-state difference `hs_pos - hs_neg` at the assistant tag, over a set of diverse contexts. This is normal mean-mass contrastive steering.
 3. Generate completions with this vector.