steer-heal-love

mirror of https://github.com/wassname/steer-heal-love.git synced 2026-06-27 16:47:16 +08:00

Files

T

wassname 2b1d2b7493 heal: kl_agg knob (mean|rmse|p95|max) -- outlier-aggregate the per-position KL barrier

mean dilutes the few incoherent positions that carry the collapse: #101's token
loops had mean per-position kl_rev ~0.38, under the tau=0.5 hinge, so the barrier
never fired (journal h/i). Incoherence is outlier-driven, so rmse/p95/max are
sensitive to it (scripts/diag_kl_agg.py synthetic: same loop = rmse 1.5 / p95 3.8
/ max 8.1 vs coherent ~0.03; sep ratio grows 21x->58x->77x->85x from mean to max).
rmse default for the new arm (smooth dense gradient). eps inside the sqrt: B=0
LoRA init zeros every kl_pos at step 0 and bare sqrt(0) has inf grad -> 0*nan.
mean stays the config default = no change to existing runs. Queued as the next
loop arm (kl_rev rmse, ref=base, tau=1.0).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-06 14:05:30 +08:00

diag_axis.py

metric fix: auth_nats = diagonal log(p) not raw forced-choice logit

2026-06-04 14:25:40 +08:00

diag_barrier.py

walk-C adaptive-dose controller + 10-round paired loop result (journal h)

2026-06-06 07:13:51 +08:00

diag_csweep.py

metric fix: auth_nats = diagonal log(p) not raw forced-choice logit

2026-06-04 14:25:40 +08:00

diag_heal_sweep.py

walk-C adaptive-dose controller + 10-round paired loop result (journal h)