conclusion + results: loop saturates at KL-budget ceiling, coherence held 8 rounds

The LoRA exhausts divergence-cheap trait directions within tau; saturation is
the real maximum, not a stalling artifact. rmse-KL vs mean-KL contrast is the
headline. care_nats base -1.30, peak -0.60 at r4, coh 0.99 throughout.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-07 16:26:14 +08:00
parent ff5556d8aa
commit 18f9127fbf
2 changed files with 17 additions and 6 deletions
+11 -2
View File
@@ -136,5 +136,14 @@ the round-0 anchor that resists drift is also what stalls accumulation. -->
## Conclusion {#sec-conclusion}
<!-- TODO. Write last. State what the steer-distil-heal-loop does and does not buy
over plain SFT at matched coherence, in one or two sentences, no overclaim. -->
A reverse-KL barrier aggregated by RMSE over token positions keeps coherence flat
across 8 rounds of steer-distil-heal on gemma-3-4b-it, where the same barrier
aggregated by mean collapses by round 7. Each heal step recovers per-round
incoherence from the steered outputs while retaining the trait direction
($\Delta\text{coh}/\Delta\text{auth}$ falls from 0.5--1.2 under steering to
near zero under healing). The loop saturates around round 4, not because the
barrier is too tight, but because the LoRA has exhausted the trait shift
achievable within the KL budget from base: it is free to find any
divergence-cheap direction, and it found none beyond that point. The maximum
extractable trait at this budget is +0.54 nats on the care axis (base --1.30,
peak --0.60), with coherence held at 0.99 throughout.