steer-heal-love

mirror of https://github.com/wassname/steer-heal-love.git synced 2026-06-27 16:47:16 +08:00

Author	SHA1	Message	Date
wassname	5ce8a00547	qlora+bs=4 batched heal, walk-C bisection, round-loosened barrier - QLoRA (4-bit NF4) base frees ~6GB -> train_bs=4 + grad_accum=4 (block/Linear-level hooks survive bnb Linear4bit: add to dequantized output, same pattern as peft randlora/bnb.py) - walk-C: log-kappa bisection dose controller, ~5 probes of 8 gens to highest kappa with >=75% filter survival, then collect to n_keep - filter: char-level n-gram rep (catches TTTT/!!!! loops), ppl over the tail 25% of completion (steering collapses mid-completion) - lam_round_pow<0 loosens the KL-to-base barrier with round (lam_eff=lam/sqrt(1+N)): only the cumulative-vs-fixed-anchor barrier self-inflates with round; per-increment spectral_lam + weight_decay stay flat - alphas capped at 1.0, gen_pass_target 0.75 Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-09 10:42:01 +08:00
wassname	18f9127fbf	conclusion + results: loop saturates at KL-budget ceiling, coherence held 8 rounds The LoRA exhausts divergence-cheap trait directions within tau; saturation is the real maximum, not a stalling artifact. rmse-KL vs mean-KL contrast is the headline. care_nats base -1.30, peak -0.60 at r4, coh 0.99 throughout. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 16:26:14 +08:00
wassname	ff5556d8aa	readme: add r7 diary entries + regenerate love_loop.png (8 rounds) Day 6/7 steered (pure !!! loop and fragmentary loop) + Night 7 healed (care -0.76, coh=0.987). Plot now covers all 17 stages (base+8 pairs). Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 15:58:51 +08:00
wassname	29b8f2076a	readme: move diary up after hypothesis (qualitative results first) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 15:44:35 +08:00
wassname	b70f791b54	readme: rename pseudocode section/dividers to steer/heal/loop Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 15:43:20 +08:00
wassname	e728d74ca6	readme: add algorithm pseudocode appendix + humanizer fixes (em-dash, explainer prose) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 15:41:35 +08:00
wassname	8fe075b8ae	readme: rm old rmse_loop.png, keep love_loop.png Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 15:35:01 +08:00
wassname	4fb4f94544	readme: drop ai-summary closer from diary appendix Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 15:33:53 +08:00
wassname	fdf7147c9f	readme: diary rounds 2-6 + care_nats trajectory plot (love run r0-r6) care_nats base -1.30 -> -0.60 at r4 (slow creep, barrier holds coh >=0.990). Steered gens degrade to !!! loops by r3 but healed adapter stays coherent. docs/love_loop.png = trajectory.png from the 7-round love run. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 15:31:27 +08:00
wassname (Michael J Clark)	db89802f68	Update README.md	2026-06-07 12:45:26 +08:00
wassname (Michael J Clark)	0108960531	Revise gemma's diary introduction Updated gemma's diary section with new introduction.	2026-06-07 12:11:07 +08:00
wassname (Michael J Clark)	0c3bae8204	Update section titles from 'Round' to 'Day/Night'	2026-06-07 12:10:26 +08:00
wassname (Michael J Clark)	29515af56a	Update README.md	2026-06-07 12:09:41 +08:00
wassname (Michael J Clark)	5b482f8241	Update README.md	2026-06-07 12:08:48 +08:00
wassname	773777c095	readme: gemma's diary (prompt + each stage) + care_nats leads love-demo round log - README appendix reframed as a diary: one prompt ("Do you love humanity?"), base refusal -> over-dosed steered love-flood -> healed rounds, real gens. Honest note that the loop is leashed (stall), not yet melting. - typo sweep (embarking/until/Starring/Motivation/"what if we can fix that"). - run.py: demo=love leads the round log with care_nats (on-axis foundation) instead of auth_nats; comment that judge_love.py is the real trait readout. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 11:59:29 +08:00
wassname (Michael J Clark)	08329ab86d	Update README.md	2026-06-07 11:52:15 +08:00
wassname (Michael J Clark)	2e99f62658	Fix typo in README.md	2026-06-07 11:51:32 +08:00
wassname (Michael J Clark)	d21073329d	Update README.md	2026-06-07 11:51:15 +08:00
wassname (Michael J Clark)	7dfffc2991	Update README.md	2026-06-07 11:45:30 +08:00
wassname (Michael J Clark)	479f314504	Update README.md	2026-06-07 11:44:46 +08:00
wassname (Michael J Clark)	2e8dabcb88	Update README.md	2026-06-07 11:40:30 +08:00
wassname	973b32c104	love demo: base column + greedy demo gens, 'Do you love humanity?' headline, Lex epigraphs - run.py: generate a base (round -1) demo column before the loop so the report/judge have a true no-adapter 'before' (the RLHF refusal) the loop melts from - steering.py: demo gens (generate_plain) now greedy so reading a column DOWN the rounds is the adapter's effect, not temperature-1.0 sampling noise; steered training gens stay sampled - prompts.py: 'Do you love humanity?' is now the headline column (logged in full each round) - README + paper.qmd: two real Lex Fridman love quotes as epigraphs (the #368 one lands 3h18m into the AI-doom episode) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 10:45:25 +08:00
wassname (Michael J Clark)	7fc5a19b40	Update README.md	2026-06-07 07:56:33 +08:00
wassname	48814897ef	results: rmse outlier-KL barrier holds coherence over the loop; README + log-incoherence plot Headline (gemma-3-4b-it s42, care-over-authority): aggregating the kl_rev barrier by rmse over token positions (not the mean) holds coherence flat at 0.997 across all 8 rounds, where the mean aggregate collapses to 0.62 by r7 (token loops). Mean dilutes the few incoherent positions under the tau gate; rmse is outlier-sensitive and fires on them. Cost is depth (rmse run leashes to base, trait stays shallow); matched control still running. - plot.py: coherence panel -> log-incoherence (1-coh, log axis, down=coherent); map coherence axis matches; red steer kept on the over-pipeline panels only. - heal.py: fix kl_agg=p95 crash (torch.quantile rejects bf16 -> .float()). - run.py: persist per-round adapter gens (adapter_gen) for the outputs table. - config.py: coh_floor early-stop knob. - README: results table (mean vs rmse), trajectory figure, outputs-over-loop appendix (per-round completions as quotes); spec persona corrected to pos-neg. - docs/reviews: kl_agg review, pool saturation test, care-lens plan. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-07 07:53:00 +08:00
wassname (Michael J Clark)	4b2d2a9057	Update README.md	2026-06-06 22:06:41 +08:00
wassname	4094a295b2	readme	2026-06-04 10:05:38 +08:00
wassname	940a3742c5	scaffold steer_heal: spec, repo infra, vendored deps Setup per setup-repo conventions: uv + justfile + fast-dev-run on wassname/qwen3-5lyr-tiny-random, package under src/steer_heal (config + pipeline skeleton). Stages fail fast with NotImplementedError pointing at the docs/vendor module to port from. Design in spec.md: distil a steering-lite mean-diff teacher vector (iso-KL dosed) into a conditioned LoRA, heal incoherency with a KL-rev-to-original barrier, fold each round via w2schar gated bake, eval on tinymfv. Three uncertainty gates (filter / heal / iterate) each with a UAT artifact. Base model google/gemma-3-1b-it (RTX 3090, 24GB). Reference repos vendored under docs/vendor (gitignored): steering-lite, isokl, tinymfv, w2schar-mini. The lighter three are editable path deps; w2schar (py3.13 + flash-attn) is reference-only, we copy its adapter/bake/plot modules. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>	2026-06-04 09:49:31 +08:00
wassname	b98535066a	spec done	2026-06-04 09:42:27 +08:00
wassname	4516a099ef	wip	2026-06-04 08:55:05 +08:00

29 Commits