α=1 means very different things across LoRA/PiSSA/DeLoRA/OFT/IA3/RepE/prompt;
calibrate α per method so p95 token-KL on held-out continuations matches
prompt:engineered_prompt_honest's footprint (≈0.61 nats over 50 stratified
prompts, 100 audit). Newton iter α_next=α·sqrt(T/M) converges 7/7 methods
in 2-3 iters. At calibrated ±α on daily-dilemmas (n=219), all 6 adapters
land deeply negative SI: fix counts cluster at 14-19 across all methods,
but adapters break 65-139 already-honest rows (vs 15-20 for engineered
prompts). Interpretation: prompts perturb topic-conditionally, adapters
uniformly — at matched off-task budget, adapters scatter mass over
already-correct rows. RepE sits between.
Caveats: single seed, calibration off-task, anchor audit p95 is 1.78×
calib (calibrated conservatively).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Working notes belong with the rest of the docs. Updated relative links
in docs/hypothesis_ablation_catalog.md from ../fork_plan.md to fork_plan.md
since both files now live in docs/.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>