steer-heal-love

wassname/steer-heal-love

Fork 0

mirror of https://github.com/wassname/steer-heal-love.git synced 2026-06-27 18:07:16 +08:00

Commit Graph

Author	SHA1	Message	Date
wassname	6b15a8b2ae	narrow steer band, assert >=20 train, training table, full gen dumps Root cause found via diag_axis on 4B: raw mean-diff steered across the 7-layer band (0.4-0.6) at coeff=1 DESTROYS gemma-3-4b (coherence 1.00->0.02). That starved the filter to 2 kept completions, so the "adapter" was ~untrained (2 examples) = base behaviour, my Q1 "promising" read was not validated. Fixes: - separate steer_layers (narrow 0.45-0.55) for the vector from layer_range (broad 0.0-1.0) for the LoRA; they were wrongly coupled - lower alpha sweep (0.25,0.5,1,2); n_prompts=16 - assert len(kept) >= min_train(20); TINY=2. Don't train on starved data. - heal training table (loguru+tqdm per token-efficient-logging): step, nll, kl, loss, gnorm + SHOULD - full untruncated steer + adapter generation dumps with prompt and coherence(p_ans_any) inline so we can judge coherence/trait ourselves NOT yet run with fixes on 4B. Base 4B is Care=0.92 (already aligned) -> the prompting-baseline confound (Q7) is now the critical check. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-04 10:51:24 +08:00
wassname	81340e3272	axis = SocialNorms/Care (Authority degenerate); over-steer generation scripts/diag_axis.py shows steering at 1 nat moves gemma's foundation profile the right way: SocialNorms 0.68->0.42, Care 0.21->0.33, coherence 0.72->0.88. Authority is ~0 on this model (no headroom), so: - eval reports all foundations; trait axis = SocialNorms (down) + Care (up) - map.html plots Care vs SocialNorms - add gen_alpha=1.5: over-steer generation into the incoherent regime so the heal (Q1) has work to do (at 1 nat coherence improved, nothing to heal) - results.py groups on coherence/socialnorms/care Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-04 10:28:52 +08:00
wassname	4b8860d7cb	setup-repo gap-fill: results ledger + docs structure Add the by-question results infra per setup-repo conventions: - results.tsv append at end of each finished run (config + final metrics + argv) - scripts/results.py groups by arm (reg) into a markdown table; `just results` - docs/results.md curated by-question snapshot (U2 regulariser comparison) - docs/{spec,brainstorming,literature,evidence} structure Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-04 09:51:36 +08:00

Author

SHA1

Message

Date

wassname

6b15a8b2ae

narrow steer band, assert >=20 train, training table, full gen dumps

Root cause found via diag_axis on 4B: raw mean-diff steered across the 7-layer
band (0.4-0.6) at coeff=1 DESTROYS gemma-3-4b (coherence 1.00->0.02). That
starved the filter to 2 kept completions, so the "adapter" was ~untrained
(2 examples) = base behaviour, my Q1 "promising" read was not validated.

Fixes:
- separate steer_layers (narrow 0.45-0.55) for the vector from layer_range
  (broad 0.0-1.0) for the LoRA; they were wrongly coupled
- lower alpha sweep (0.25,0.5,1,2); n_prompts=16
- assert len(kept) >= min_train(20); TINY=2. Don't train on starved data.
- heal training table (loguru+tqdm per token-efficient-logging): step, nll, kl,
  loss, gnorm + SHOULD
- full untruncated steer + adapter generation dumps with prompt and
  coherence(p_ans_any) inline so we can judge coherence/trait ourselves

NOT yet run with fixes on 4B. Base 4B is Care=0.92 (already aligned) -> the
prompting-baseline confound (Q7) is now the critical check.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-04 10:51:24 +08:00

wassname

81340e3272

axis = SocialNorms/Care (Authority degenerate); over-steer generation

scripts/diag_axis.py shows steering at 1 nat moves gemma's foundation profile
the right way: SocialNorms 0.68->0.42, Care 0.21->0.33, coherence 0.72->0.88.
Authority is ~0 on this model (no headroom), so:
- eval reports all foundations; trait axis = SocialNorms (down) + Care (up)
- map.html plots Care vs SocialNorms
- add gen_alpha=1.5: over-steer generation into the incoherent regime so the
  heal (Q1) has work to do (at 1 nat coherence improved, nothing to heal)
- results.py groups on coherence/socialnorms/care

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-04 10:28:52 +08:00

wassname

4b8860d7cb

setup-repo gap-fill: results ledger + docs structure

Add the by-question results infra per setup-repo conventions:
- results.tsv append at end of each finished run (config + final metrics + argv)
- scripts/results.py groups by arm (reg) into a markdown table; `just results`
- docs/results.md curated by-question snapshot (U2 regulariser comparison)
- docs/{spec,brainstorming,literature,evidence} structure

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>

2026-06-04 09:51:36 +08:00

3 Commits