6 Commits

Author SHA1 Message Date
wassname 57a08750b8 fix: on-policy data paths, 4-bit inference, revert adapter defaults
- data/load_pairs: path now includes model slug (out/data/{model}/{behavior})
  so data from different models can't be silently reused
- data.py, kl_calibrate.py, tinymfv_airisk.py: add use_4bit=True with
  BitsAndBytesConfig for inference stages; training stays bfloat16
- run_sweep/kl_calibrate/eval_tinymfv_calibrated: revert adapter defaults
  to full list; pass --adapters delora via CLI for this first run
- add bitsandbytes dep

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 17:31:09 +08:00
wassname 43278709d7 fix: transformers>=5.6.0, flash-attn locked, switch to Qwen3-4B
Qwen3.5-4B requires linear_attention mask support not in transformers<5.6.
Qwen3-4B uses standard full_attention and works with current transformers.
flash-attn added as URL dep so uv sync keeps it in .venv.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 16:54:50 +08:00
wassname 8ed3103e47 feat(authority): add authority behavior, logratio+SI metrics, prune dead code
- Add AUTHORITY_PROMPT + 3 persona pairs (MFT-paper framing, sl-identical)
- Wire authority into data._personas/_topics/_build_specs
- Add SINGLE_FOUNDATION + _axis_shift for single-foundation behaviors
- Add logratio to per-vignette/frame scoring (same convention as sl)
- Add _si.py: port si_per_foundation from sl foundations.py
- Drop prompt_baseline mode, repe, sycophancy, subspace, run_demo
- Strip kl_calibrate to dW-only; remove repe+prompt_texts deps
- Simplify replicate.py to train+diff only (no eval/demo/subspace)
- Default behavior="authority" across eval, sweep, replicate
- Install tinymfv git dep; flash_attn 2.6.3 prebuilt wheel

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-03 14:04:23 +08:00
wassname c828b0c00b baselines 2026-04-27 19:40:43 +08:00
wassname 363e2db14d phase 0-2: HF+PEFT pipeline, smoke, subspace alignment
Rip Axolotl/vLLM, switch to HF+PEFT functional pipeline.
Add LoRA/DoRA/PiSSA/DeLoRA train, delta-W diff, weight_steer hook,
sycophancy logratio eval, and SVD top-k + weak-readout alignment.
Smoke runs end-to-end on tiny-random qwen3 with BEARTYPE=1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 20:14:07 +08:00
wassname f0bce8be90 tidy 2026-04-25 19:27:53 +08:00