From 4b2d2a9057cdef47f02feb09c4c4c030b57e864e Mon Sep 17 00:00:00 2001 From: "wassname (Michael J Clark)" <1103714+wassname@users.noreply.github.com> Date: Sat, 6 Jun 2026 22:06:41 +0800 Subject: [PATCH] Update README.md --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ec23351..33804ae 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# steer-heal-love +# steer, heal, love What if you can **steer**, **heal** the steering and repeat untill alignment (**love**). @@ -6,6 +6,7 @@ What if you can **steer**, **heal** the steering and repeat untill alignment (** Hypothesis: you can distill a steering vector into LoRA weights and "heal" the incoherency the vector injects by regularising the training (KL to base, or weight decay). Then loop and see what multiple rounds give you. The crux: KL-to-base penalises all drift, persona shift included. The bet is that incoherency drift is large and erratic while the persona shift is small and systematic, so KL kills the incoherency preferentially. If that's wrong, we just trade persona strength for coherence instead of getting both. +image ## Experiment