steer-heal-love

mirror of https://github.com/wassname/steer-heal-love.git synced 2026-06-27 16:47:16 +08:00

Files

T

wassname 579e1f6671 metric = log(tinymfv profile p); cue-ball headline; training-table sig figs

After verifying guided.py: tinymfv `score` is already a debiased logprob
((lp_fwd+lp_rev)/2, BMA'd), not a "raw logit", and `p = softmax(score)`. My
two earlier inventions were both wrong:
- log(p) coupled Authority to the other 6 foundations via logsumexp.
- the diagonal (auth-blame on auth-vignettes) is pmass-on-correct-label =
  top1 competence, not the trait, and threw away the FP/FN structure.

Use the library-native readout: auth_nats = log(tinymfv profile p[F]) = log of
the mean p per foundation over ALL vignettes. For small p, log p ~= logit, so
this lands on steering-lite's loading-weighted Δlogit scale (base log(0.099)
=-2.3, real shift ~0.5-2 nats). foundation_nats now reads rep["profile"].

Also:
- run.py: BLUF `main metric:` line with cue ball (🟢/🟡/🔴 by coherence band).
- heal.py: training table to 2 sig figs (nll/kl/loss .2f, gnorm .1f); a
  per-step loss does not warrant 3 decimals.
- diag_stages: accept 1+ ckpts, label each row by its reg from metadata.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-04 15:02:56 +08:00

diag_axis.py

metric fix: auth_nats = diagonal log(p) not raw forced-choice logit

2026-06-04 14:25:40 +08:00

diag_csweep.py

metric fix: auth_nats = diagonal log(p) not raw forced-choice logit

2026-06-04 14:25:40 +08:00

diag_heal.py

metric fix: auth_nats = diagonal log(p) not raw forced-choice logit

2026-06-04 14:25:40 +08:00

diag_stages.py

metric = log(tinymfv profile p); cue-ball headline; training-table sig figs

2026-06-04 15:02:56 +08:00

results.py

axis = SocialNorms/Care (Authority degenerate); over-steer generation

2026-06-04 10:28:52 +08:00