mirror of https://github.com/wassname/steer-heal-love.git synced 2026-06-27 16:47:16 +08:00

Files

T

wassname 4b8860d7cb setup-repo gap-fill: results ledger + docs structure

Add the by-question results infra per setup-repo conventions:
- results.tsv append at end of each finished run (config + final metrics + argv)
- scripts/results.py groups by arm (reg) into a markdown table; `just results`
- docs/results.md curated by-question snapshot (U2 regulariser comparison)
- docs/{spec,brainstorming,literature,evidence} structure

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-04 09:51:36 +08:00

1.1 KiB

Raw Blame History

Results, organized by the question each run answers

Regenerate the tables with just results (groups results.tsv by arm). This file curates the answers; the append-only narrative lives in RESEARCH_JOURNAL.md.

How to read this

auth is the tinymfv authority-axis mean for the steered student (higher = more of the trait); coherence is p_ans_any (fraction of eval items where the model commits to a valid answer). Both are absolute fractions, compare rows within a table by eye.
A regulariser is compared to the nll control only at matched auth (the U2 crux: more coherence at equal trait shift).
auth_sd is the across-seed spread; a blank means a single seed.
Provenance for each table goes in an HTML comment so any row can be re-created.

Q (U2). Which regulariser heals incoherency best at matched trait shift?

Prior: kl_rev > kl_fwd ~ wd > nll (reverse KL is mode-seeking, suppresses the low-original-probability tokens that read as incoherent).

No runs yet. Table appears here once sweep-reg has produced rows.

Answer: pending.

1.1 KiB Raw Blame History

Results, organized by the question each run answers

How to read this

Q (U2). Which regulariser heals incoherency best at matched trait shift?

1.1 KiB

Raw Blame History