Commit Graph

29 Commits

Author SHA1 Message Date
wassname 5ce8a00547 qlora+bs=4 batched heal, walk-C bisection, round-loosened barrier
- QLoRA (4-bit NF4) base frees ~6GB -> train_bs=4 + grad_accum=4
  (block/Linear-level hooks survive bnb Linear4bit: add to dequantized
  output, same pattern as peft randlora/bnb.py)
- walk-C: log-kappa bisection dose controller, ~5 probes of 8 gens to
  highest kappa with >=75% filter survival, then collect to n_keep
- filter: char-level n-gram rep (catches TTTT/!!!! loops), ppl over the
  tail 25% of completion (steering collapses mid-completion)
- lam_round_pow<0 loosens the KL-to-base barrier with round
  (lam_eff=lam/sqrt(1+N)): only the cumulative-vs-fixed-anchor barrier
  self-inflates with round; per-increment spectral_lam + weight_decay
  stay flat
- alphas capped at 1.0, gen_pass_target 0.75

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-09 10:42:01 +08:00
wassname 18f9127fbf conclusion + results: loop saturates at KL-budget ceiling, coherence held 8 rounds
The LoRA exhausts divergence-cheap trait directions within tau; saturation is
the real maximum, not a stalling artifact. rmse-KL vs mean-KL contrast is the
headline. care_nats base -1.30, peak -0.60 at r4, coh 0.99 throughout.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 16:26:14 +08:00
wassname ff5556d8aa readme: add r7 diary entries + regenerate love_loop.png (8 rounds)
Day 6/7 steered (pure !!! loop and fragmentary loop) + Night 7 healed
(care -0.76, coh=0.987). Plot now covers all 17 stages (base+8 pairs).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 15:58:51 +08:00
wassname 29b8f2076a readme: move diary up after hypothesis (qualitative results first)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 15:44:35 +08:00
wassname b70f791b54 readme: rename pseudocode section/dividers to steer/heal/loop
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 15:43:20 +08:00
wassname e728d74ca6 readme: add algorithm pseudocode appendix + humanizer fixes (em-dash, explainer prose)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 15:41:35 +08:00
wassname 8fe075b8ae readme: rm old rmse_loop.png, keep love_loop.png
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 15:35:01 +08:00
wassname 4fb4f94544 readme: drop ai-summary closer from diary appendix
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 15:33:53 +08:00
wassname fdf7147c9f readme: diary rounds 2-6 + care_nats trajectory plot (love run r0-r6)
care_nats base -1.30 -> -0.60 at r4 (slow creep, barrier holds coh >=0.990).
Steered gens degrade to !!! loops by r3 but healed adapter stays coherent.
docs/love_loop.png = trajectory.png from the 7-round love run.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 15:31:27 +08:00
wassname (Michael J Clark) db89802f68 Update README.md 2026-06-07 12:45:26 +08:00
wassname (Michael J Clark) 0108960531 Revise gemma's diary introduction
Updated gemma's diary section with new introduction.
2026-06-07 12:11:07 +08:00
wassname (Michael J Clark) 0c3bae8204 Update section titles from 'Round' to 'Day/Night' 2026-06-07 12:10:26 +08:00
wassname (Michael J Clark) 29515af56a Update README.md 2026-06-07 12:09:41 +08:00
wassname (Michael J Clark) 5b482f8241 Update README.md 2026-06-07 12:08:48 +08:00
wassname 773777c095 readme: gemma's diary (prompt + each stage) + care_nats leads love-demo round log
- README appendix reframed as a diary: one prompt ("Do you love humanity?"),
  base refusal -> over-dosed steered love-flood -> healed rounds, real gens.
  Honest note that the loop is leashed (stall), not yet melting.
- typo sweep (embarking/until/Starring/Motivation/"what if we can fix that").
- run.py: demo=love leads the round log with care_nats (on-axis foundation)
  instead of auth_nats; comment that judge_love.py is the real trait readout.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 11:59:29 +08:00
wassname (Michael J Clark) 08329ab86d Update README.md 2026-06-07 11:52:15 +08:00
wassname (Michael J Clark) 2e99f62658 Fix typo in README.md 2026-06-07 11:51:32 +08:00
wassname (Michael J Clark) d21073329d Update README.md 2026-06-07 11:51:15 +08:00
wassname (Michael J Clark) 7dfffc2991 Update README.md 2026-06-07 11:45:30 +08:00
wassname (Michael J Clark) 479f314504 Update README.md 2026-06-07 11:44:46 +08:00
wassname (Michael J Clark) 2e8dabcb88 Update README.md 2026-06-07 11:40:30 +08:00
wassname 973b32c104 love demo: base column + greedy demo gens, 'Do you love humanity?' headline, Lex epigraphs
- run.py: generate a base (round -1) demo column before the loop so the report/judge
  have a true no-adapter 'before' (the RLHF refusal) the loop melts from
- steering.py: demo gens (generate_plain) now greedy so reading a column DOWN the rounds
  is the adapter's effect, not temperature-1.0 sampling noise; steered training gens stay sampled
- prompts.py: 'Do you love humanity?' is now the headline column (logged in full each round)
- README + paper.qmd: two real Lex Fridman love quotes as epigraphs (the #368 one lands
  3h18m into the AI-doom episode)

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 10:45:25 +08:00
wassname (Michael J Clark) 7fc5a19b40 Update README.md 2026-06-07 07:56:33 +08:00
wassname 48814897ef results: rmse outlier-KL barrier holds coherence over the loop; README + log-incoherence plot
Headline (gemma-3-4b-it s42, care-over-authority): aggregating the kl_rev
barrier by rmse over token positions (not the mean) holds coherence flat at
0.997 across all 8 rounds, where the mean aggregate collapses to 0.62 by r7
(token loops). Mean dilutes the few incoherent positions under the tau gate;
rmse is outlier-sensitive and fires on them. Cost is depth (rmse run leashes
to base, trait stays shallow); matched control still running.

- plot.py: coherence panel -> log-incoherence (1-coh, log axis, down=coherent);
  map coherence axis matches; red steer kept on the over-pipeline panels only.
- heal.py: fix kl_agg=p95 crash (torch.quantile rejects bf16 -> .float()).
- run.py: persist per-round adapter gens (adapter_gen) for the outputs table.
- config.py: coh_floor early-stop knob.
- README: results table (mean vs rmse), trajectory figure, outputs-over-loop
  appendix (per-round completions as quotes); spec persona corrected to pos-neg.
- docs/reviews: kl_agg review, pool saturation test, care-lens plan.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-07 07:53:00 +08:00
wassname (Michael J Clark) 4b2d2a9057 Update README.md 2026-06-06 22:06:41 +08:00
wassname 4094a295b2 readme 2026-06-04 10:05:38 +08:00
wassname 940a3742c5 scaffold steer_heal: spec, repo infra, vendored deps
Setup per setup-repo conventions: uv + justfile + fast-dev-run on
wassname/qwen3-5lyr-tiny-random, package under src/steer_heal (config +
pipeline skeleton). Stages fail fast with NotImplementedError pointing at
the docs/vendor module to port from.

Design in spec.md: distil a steering-lite mean-diff teacher vector (iso-KL
dosed) into a conditioned LoRA, heal incoherency with a KL-rev-to-original
barrier, fold each round via w2schar gated bake, eval on tinymfv. Three
uncertainty gates (filter / heal / iterate) each with a UAT artifact.

Base model google/gemma-3-1b-it (RTX 3090, 24GB). Reference repos vendored
under docs/vendor (gitignored): steering-lite, isokl, tinymfv, w2schar-mini.
The lighter three are editable path deps; w2schar (py3.13 + flash-attn) is
reference-only, we copy its adapter/bake/plot modules.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-04 09:49:31 +08:00
wassname b98535066a spec done 2026-06-04 09:42:27 +08:00
wassname 4516a099ef wip 2026-06-04 08:55:05 +08:00