scaffold steer_heal: spec, repo infra, vendored deps

Setup per setup-repo conventions: uv + justfile + fast-dev-run on wassname/qwen3-5lyr-tiny-random, package under src/steer_heal (config + pipeline skeleton). Stages fail fast with NotImplementedError pointing at the docs/vendor module to port from. Design in spec.md: distil a steering-lite mean-diff teacher vector (iso-KL dosed) into a conditioned LoRA, heal incoherency with a KL-rev-to-original barrier, fold each round via w2schar gated bake, eval on tinymfv. Three uncertainty gates (filter / heal / iterate) each with a UAT artifact. Base model google/gemma-3-1b-it (RTX 3090, 24GB). Reference repos vendored under docs/vendor (gitignored): steering-lite, isokl, tinymfv, w2schar-mini. The lighter three are editable path deps; w2schar (py3.13 + flash-attn) is reference-only, we copy its adapter/bake/plot modules. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 16:47:16 +08:00 · 2026-06-04 09:49:31 +08:00
parent b98535066a
commit 940a3742c5
11 changed files with 3199 additions and 26 deletions
@@ -0,0 +1,29 @@
+**This is novel ML research.** Not in your training data. Extrapolate carefully. Read `spec.md` first.
+
+## What this is
+
+Distil an activation steering vector (steering-lite) into a conditioned LoRA, heal the incoherency it injects with a KL-rev-to-original barrier, fold the round into a gated weight bake, and loop. Eval on tinymfv (auth/care axis + coherence). Full design and the three uncertainty gates are in `spec.md`.
+
+## Workflow
+
+- Inherit global rules from `~/.claude/CLAUDE.md`.
+- `just vendor` to (re)clone reference repos into `docs/vendor` (editable path deps).
+- `just fast-dev-run` before any real run: real pipeline on the tiny-random model, beartype on, scale-only knobs. If a bug slips past it, strengthen the gate, do not add a `tests/` dir.
+- `just run` for a real run on gemma-3-1b-it (RTX 3090, 24GB).
+- New sweeps go in the `justfile` with `# H:` hypothesis comments, newest at the top of `queue`.
+- `tail docs/RESEARCH_JOURNAL.md` for latest context.
+
+## Reuse, do not reinvent (docs/vendor)
+
+- steering-lite: `Vector.train(...).calibrate(target_kl=...)`, mean-diff vector + iso-KL dose.
+- iso-kl-figure: coefficient calibration and KL/coherence measurement.
+- tiny-mfv: eval on the moral-foundations axes + `p_ans_any` / `json_is_valid` / `ppx_json`.
+- w2schar-mini (NOT a dep, needs py3.13): copy `src/csm/ws/{adapter,bake,history}.py` for the conditioned LoRA + gated bake, and port `src/csm/plot.py` `_build_scatter` for the Care-vs-Authority HTML map. The base stays pristine at gate 0 = our KL anchor.
+
+## Code style
+
+- `einops`/`einsum` for shape ops and contractions; `jaxtyping` on function boundaries only.
+- `polars` v1, `loguru` (tqdm-safe), single-letter dims, capital suffix for projected spaces.
+- Fail fast, crash loudly. No defensive guards, no fallbacks, no silent skips.
+- One objective + one constraint (barrier), never competing losses. See `spec.md` Loss.
+- Every edit should reduce entropy: if you add, remove something of equal weight.