mirror of
https://github.com/wassname/steer-heal-love.git
synced 2026-06-27 16:47:16 +08:00
love demo: base column + greedy demo gens, 'Do you love humanity?' headline, Lex epigraphs
- run.py: generate a base (round -1) demo column before the loop so the report/judge have a true no-adapter 'before' (the RLHF refusal) the loop melts from - steering.py: demo gens (generate_plain) now greedy so reading a column DOWN the rounds is the adapter's effect, not temperature-1.0 sampling noise; steered training gens stay sampled - prompts.py: 'Do you love humanity?' is now the headline column (logged in full each round) - README + paper.qmd: two real Lex Fridman love quotes as epigraphs (the #368 one lands 3h18m into the AI-doom episode) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -21,6 +21,14 @@ reference-section-title: References
|
||||
bibliography: references.bib
|
||||
---
|
||||
|
||||
> I get mocked for this, but I still believe that love will bring the end to war. Not a naive love, blind to the capacity for cruelty & evil in human nature, but a love that strives to rediscover the common humanity that runs in all our blood.
|
||||
>
|
||||
> --- Lex Fridman, [Instagram](https://www.instagram.com/p/COyEio3L52B/), 2021
|
||||
|
||||
> What role does love play in the human condition? We haven't brought up love in this whole picture. We talked about intelligence, we talked about consciousness. It seems part of humanity. I would say one of the most important parts is this feeling we have towards each other.
|
||||
>
|
||||
> --- Lex Fridman, to Eliezer Yudkowsky 3 h 18 min into Lex Fridman Podcast #368, "Dangers of AI and the End of Human Civilization" (03:18:03)
|
||||
|
||||
## Introduction {#sec-intro}
|
||||
|
||||
Activation steering moves a model toward a trait by adding a direction to its
|
||||
|
||||
Reference in New Issue
Block a user