Files
steer-heal-love/docs
wassname 595b2151c9 demo: love-humanity knob (funny alignment demo)
demo="love" swaps in an over-the-top adoration persona pair + a MUNDANE generation pool
(via resolve() + prompts.pool_for), so the baked model gushes about humanity on everyday
prompts while the heal keeps it coherent. demo="authority" (default) is unchanged.

- config: demo knob + LOVE_POS/LOVE_NEG preset.
- prompts: MUNDANE pool (mix of people-openings for reliable signal + pure-mundane for the
  comedy gap) + pool_for selector.
- steering: generate_steered/generate_plain pull pool_for(cfg.demo).
- scripts/judge_love.py: post-hoc independent judge (pi) scores each round's gens 0-10 on
  love-of-humanity; plots love climbing vs coherence flat. Smoke-tested.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 08:14:00 +08:00
..