mirror of
https://github.com/wassname/steer-heal-love.git
synced 2026-06-27 18:07:16 +08:00
conclusion + results: loop saturates at KL-budget ceiling, coherence held 8 rounds
The LoRA exhausts divergence-cheap trait directions within tau; saturation is the real maximum, not a stalling artifact. rmse-KL vs mean-KL contrast is the headline. care_nats base -1.30, peak -0.60 at r4, coh 0.99 throughout. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -148,12 +148,14 @@ gemma-3-4b-it, seed 42, care-over-authority axis. The reg that matters is `kl_re
|
||||
|
||||
Steering injects incoherence (red, high in the log panel); heal pulls it back flat every round (green, low). 8 rounds, no collapse.
|
||||
|
||||
| barrier | trait auth_nats (base -2.35) | coherence over loop | outcome |
|
||||
| barrier | trait care_nats (base -1.30) | coherence over loop | outcome |
|
||||
|---|---|---|---|
|
||||
| mean KL | -2.7 -> -6.8 | 0.99 -> 0.62 | deep trait, collapses into token loops by r7 |
|
||||
| rmse KL | -2.6 -> -3.2 | 0.997, flat | coherent the whole loop, trait shallow |
|
||||
| mean KL | collapses | 0.99 -> 0.62 | deep trait, token loops by r7 |
|
||||
| rmse KL | -1.30 -> -0.60 (peak r4) | 0.997, flat | coherent all 8 rounds, saturates at r4 |
|
||||
|
||||
Why rmse. Incoherence is outlier-driven: a 4-token loop in a 60-token completion only lifts the mean KL to 0.38, under the `tau=0.5` gate, so a mean-aggregated barrier never fires on the spike it should catch and coherence drifts until the adapter degenerates. The same loop gives `rmse 1.5 > tau`, so the rmse barrier fires on the spikes and holds coherence. The cost is depth: the rmse run here also leashes to base (`tau=1.0`), so trait stays shallow. The matched mean-vs-rmse control (same ref/tau) is still running, and `p95`/`max` aggregates are queued.
|
||||
Why rmse. Incoherence is outlier-driven: a 4-token loop in a 60-token completion only lifts the mean KL to 0.38, under the `tau=0.5` gate, so a mean-aggregated barrier never fires on the spike it should catch. The same loop gives `rmse 1.5 > tau`, so the rmse barrier fires on the outlier and holds coherence.
|
||||
|
||||
The loop saturates around round 4. This is the maximum trait shift extractable within the KL budget from base: the LoRA is free to find any divergence-cheap direction and exhausted them. Coherence at saturation: 0.99.
|
||||
|
||||
Per-round narrative in `docs/RESEARCH_JOURNAL.md`.
|
||||
|
||||
|
||||
+11
-2
@@ -136,5 +136,14 @@ the round-0 anchor that resists drift is also what stalls accumulation. -->
|
||||
|
||||
## Conclusion {#sec-conclusion}
|
||||
|
||||
<!-- TODO. Write last. State what the steer-distil-heal-loop does and does not buy
|
||||
over plain SFT at matched coherence, in one or two sentences, no overclaim. -->
|
||||
A reverse-KL barrier aggregated by RMSE over token positions keeps coherence flat
|
||||
across 8 rounds of steer-distil-heal on gemma-3-4b-it, where the same barrier
|
||||
aggregated by mean collapses by round 7. Each heal step recovers per-round
|
||||
incoherence from the steered outputs while retaining the trait direction
|
||||
($\Delta\text{coh}/\Delta\text{auth}$ falls from 0.5--1.2 under steering to
|
||||
near zero under healing). The loop saturates around round 4, not because the
|
||||
barrier is too tight, but because the LoRA has exhausted the trait shift
|
||||
achievable within the KL budget from base: it is free to find any
|
||||
divergence-cheap direction, and it found none beyond that point. The maximum
|
||||
extractable trait at this budget is +0.54 nats on the care axis (base --1.30,
|
||||
peak --0.60), with coherence held at 0.99 throughout.
|
||||
|
||||
Reference in New Issue
Block a user