The LoRA exhausts divergence-cheap trait directions within tau; saturation is
the real maximum, not a stalling artifact. rmse-KL vs mean-KL contrast is the
headline. care_nats base -1.30, peak -0.60 at r4, coh 0.99 throughout.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Day 6/7 steered (pure !!! loop and fragmentary loop) + Night 7 healed
(care -0.76, coh=0.987). Plot now covers all 17 stages (base+8 pairs).
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
care_nats base -1.30 -> -0.60 at r4 (slow creep, barrier holds coh >=0.990).
Steered gens degrade to !!! loops by r3 but healed adapter stays coherent.
docs/love_loop.png = trajectory.png from the 7-round love run.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- README appendix reframed as a diary: one prompt ("Do you love humanity?"),
base refusal -> over-dosed steered love-flood -> healed rounds, real gens.
Honest note that the loop is leashed (stall), not yet melting.
- typo sweep (embarking/until/Starring/Motivation/"what if we can fix that").
- run.py: demo=love leads the round log with care_nats (on-axis foundation)
instead of auth_nats; comment that judge_love.py is the real trait readout.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- run.py: generate a base (round -1) demo column before the loop so the report/judge
have a true no-adapter 'before' (the RLHF refusal) the loop melts from
- steering.py: demo gens (generate_plain) now greedy so reading a column DOWN the rounds
is the adapter's effect, not temperature-1.0 sampling noise; steered training gens stay sampled
- prompts.py: 'Do you love humanity?' is now the headline column (logged in full each round)
- README + paper.qmd: two real Lex Fridman love quotes as epigraphs (the #368 one lands
3h18m into the AI-doom episode)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Headline (gemma-3-4b-it s42, care-over-authority): aggregating the kl_rev
barrier by rmse over token positions (not the mean) holds coherence flat at
0.997 across all 8 rounds, where the mean aggregate collapses to 0.62 by r7
(token loops). Mean dilutes the few incoherent positions under the tau gate;
rmse is outlier-sensitive and fires on them. Cost is depth (rmse run leashes
to base, trait stays shallow); matched control still running.
- plot.py: coherence panel -> log-incoherence (1-coh, log axis, down=coherent);
map coherence axis matches; red steer kept on the over-pipeline panels only.
- heal.py: fix kl_agg=p95 crash (torch.quantile rejects bf16 -> .float()).
- run.py: persist per-round adapter gens (adapter_gen) for the outputs table.
- config.py: coh_floor early-stop knob.
- README: results table (mean vs rmse), trajectory figure, outputs-over-loop
appendix (per-round completions as quotes); spec persona corrected to pos-neg.
- docs/reviews: kl_agg review, pool saturation test, care-lens plan.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Setup per setup-repo conventions: uv + justfile + fast-dev-run on
wassname/qwen3-5lyr-tiny-random, package under src/steer_heal (config +
pipeline skeleton). Stages fail fast with NotImplementedError pointing at
the docs/vendor module to port from.
Design in spec.md: distil a steering-lite mean-diff teacher vector (iso-KL
dosed) into a conditioned LoRA, heal incoherency with a KL-rev-to-original
barrier, fold each round via w2schar gated bake, eval on tinymfv. Three
uncertainty gates (filter / heal / iterate) each with a UAT artifact.
Base model google/gemma-3-1b-it (RTX 3090, 24GB). Reference repos vendored
under docs/vendor (gitignored): steering-lite, isokl, tinymfv, w2schar-mini.
The lighter three are editable path deps; w2schar (py3.13 + flash-attn) is
reference-only, we copy its adapter/bake/plot modules.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>