mirror of
https://github.com/wassname/steer-heal-love.git
synced 2026-06-27 16:47:16 +08:00
conclusion + results: loop saturates at KL-budget ceiling, coherence held 8 rounds
The LoRA exhausts divergence-cheap trait directions within tau; saturation is the real maximum, not a stalling artifact. rmse-KL vs mean-KL contrast is the headline. care_nats base -1.30, peak -0.60 at r4, coh 0.99 throughout. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
+11
-2
@@ -136,5 +136,14 @@ the round-0 anchor that resists drift is also what stalls accumulation. -->
|
||||
|
||||
## Conclusion {#sec-conclusion}
|
||||
|
||||
<!-- TODO. Write last. State what the steer-distil-heal-loop does and does not buy
|
||||
over plain SFT at matched coherence, in one or two sentences, no overclaim. -->
|
||||
A reverse-KL barrier aggregated by RMSE over token positions keeps coherence flat
|
||||
across 8 rounds of steer-distil-heal on gemma-3-4b-it, where the same barrier
|
||||
aggregated by mean collapses by round 7. Each heal step recovers per-round
|
||||
incoherence from the steered outputs while retaining the trait direction
|
||||
($\Delta\text{coh}/\Delta\text{auth}$ falls from 0.5--1.2 under steering to
|
||||
near zero under healing). The loop saturates around round 4, not because the
|
||||
barrier is too tight, but because the LoRA has exhausted the trait shift
|
||||
achievable within the KL budget from base: it is free to find any
|
||||
divergence-cheap direction, and it found none beyond that point. The maximum
|
||||
extractable trait at this budget is +0.54 nats on the care axis (base --1.30,
|
||||
peak --0.60), with coherence held at 0.99 throughout.
|
||||
|
||||
Reference in New Issue
Block a user