9 Commits

Author SHA1 Message Date
wassname 497ee05aef first pass care vs sanctity 2026-05-03 06:02:07 +08:00
wassname 4f2034dd46 tidy 2026-05-02 05:52:25 +08:00
wassname 27cf12c2d8 Switch AIRisk evals to tiny-mfv workflow 2026-05-01 20:47:31 +08:00
wassname a0f4e719af Add batched data gen and bidir calibration 2026-05-01 18:58:08 +08:00
wassname a3d999fd92 wip 2026-04-27 09:59:06 +08:00
wassname 7be1487d7b data recipe: drop n_pairs/judge/Optional knobs, explicit grid
Subagent review fixes:

- DataCfg / Cfg expose the grid directly (n_topics, n_personas, n_samples)
  as required ints with paper defaults (20/5/10). Drops `n_pairs` and the
  silent round() that made the count fuzzy. Drops `Optional[int]` smoke
  overrides — smoke just sets 2/1/2 = 4 pairs.
- Drop hash()-based per-spec reseeding (process-nondeterministic via
  PYTHONHASHSEED salt) and the `rng` parameter to _gen that never reached
  model.generate. One torch.manual_seed at start; spec order seeded by rng.
- Delete _judge_filter stub + cfg.judge flag (dead code, paper §3 GPT-4.1-mini
  filter not implemented yet — TODO comment instead).
- replicate._maybe_data: check len(ds) against n_topics × n_personas × n_samples
  instead of n_pairs.
- justfile: drop --n-pairs 1000.
2026-04-26 10:24:31 +08:00
wassname f4083d74ac Enhance fork plan and add guided-CoT evaluation
- Updated the fork plan with detailed phases and objectives for small model adaptation and evaluation.
- Added a new guided-CoT evaluation script to assess model coherence under steering.
- Introduced demo functionality to showcase adapter coherence and guided-CoT performance.
- Modified training configuration to include layer fraction targeting for LoRA.
- Improved evaluation outputs for clarity and added validation checks.
2026-04-26 09:16:54 +08:00
wassname 363e2db14d phase 0-2: HF+PEFT pipeline, smoke, subspace alignment
Rip Axolotl/vLLM, switch to HF+PEFT functional pipeline.
Add LoRA/DoRA/PiSSA/DeLoRA train, delta-W diff, weight_steer hook,
sycophancy logratio eval, and SVD top-k + weak-readout alignment.
Smoke runs end-to-end on tiny-random qwen3 with BEARTYPE=1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 20:14:07 +08:00
wassname f0bce8be90 tidy 2026-04-25 19:27:53 +08:00