From 479f3145049f6fdd3a7976a3b29c12f41af3be57 Mon Sep 17 00:00:00 2001 From: "wassname (Michael J Clark)" <1103714+wassname@users.noreply.github.com> Date: Sun, 7 Jun 2026 11:44:46 +0800 Subject: [PATCH] Update README.md --- README.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index a50894d..cd6cbda 100644 --- a/README.md +++ b/README.md @@ -17,10 +17,15 @@ What if Lex Fridman is right? > > -- Lex Fridman, to Eliezer Yudkowsky 3 h 18 min into [Lex Fridman Podcast #368, "Dangers of AI and the End of Human Civilization"](https://podscript.ai/podcasts/lex-fridman-podcast/368-eliezer-yudkowsky-dangers-of-ai-and-the-end-of-human-civilization/) (03:18:03) +## Steer -## Hypothesis +Steering is interesting because it's and internal and unsupervised intervention. But it's often unreliable and incoherent. What is we can fix that? -Hypothesis: you can distill a steering vector into LoRA weights and "heal" the incoherency the vector injects by regularising the training (KL to base, or weight decay). Then loop and see what multiple rounds give you. +## Heal + +### Hypothesis + +Hypothesis: you can distill a steering vector into LoRA weights and "heal" the incoherency the vector injects by regularising the training. Then loop and see what multiple rounds give you. In concrete terms - We steer