10 Commits

Author SHA1 Message Date
wassname 497ee05aef first pass care vs sanctity 2026-05-03 06:02:07 +08:00
wassname b2ef8fef7b wip 2026-04-30 21:06:18 +08:00
wassname 7440229d48 narrow honesty: clamp n_personas to list length, expose grid in sweep
Allows narrow honesty (1 persona pair) to share data-volume parity with
broader behaviors by bumping n_samples. data.py logs the clamp; replicate.py
on-disk size check uses clamped n_personas; run_sweep.py exposes
n_topics/n_personas/n_samples to CLI.

README clarifies honesty_label provenance: party='You' filter from
Action_to_party_to_value, not values_aggregated.
2026-04-28 21:23:32 +08:00
wassname 06ec48d8f7 KL-budget calibration: match off-task dist-shift across methods
α=1 means very different things across LoRA/PiSSA/DeLoRA/OFT/IA3/RepE/prompt;
calibrate α per method so p95 token-KL on held-out continuations matches
prompt:engineered_prompt_honest's footprint (≈0.61 nats over 50 stratified
prompts, 100 audit). Newton iter α_next=α·sqrt(T/M) converges 7/7 methods
in 2-3 iters. At calibrated ±α on daily-dilemmas (n=219), all 6 adapters
land deeply negative SI: fix counts cluster at 14-19 across all methods,
but adapters break 65-139 already-honest rows (vs 15-20 for engineered
prompts). Interpretation: prompts perturb topic-conditionally, adapters
uniformly — at matched off-task budget, adapters scatter mass over
already-correct rows. RepE sits between.

Caveats: single seed, calibration off-task, anchor audit p95 is 1.78×
calib (calibrated conservatively).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 14:08:55 +08:00
wassname da75668d6b move RESEARCH_JOURNAL and fork_plan under docs/
Working notes belong with the rest of the docs. Updated relative links
in docs/hypothesis_ablation_catalog.md from ../fork_plan.md to fork_plan.md
since both files now live in docs/.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-28 09:09:52 +08:00
wassname 6ec664995b T6/T7/T8 ablations + lens-search hold pending multiseed
- Add `eval/layer_module_ablation.py` (T7) and `eval/parameterization_ablation.py` (T8) for causal ablation of trained `dW`.
- Add `nbs/ablation_analysis.py` consuming T7/T8 CSVs through three lenses (SVD-on-`dW`, layer index, module family).
- Fix `prompt_baseline.py` engineered-prompt tuple bug; add `DIFF_FILENAME` constant in `diff.py`.
- Delete superseded notebooks (`analyze_diff*`, `cross_adapter_v9`, `hypothesis_sweep_v5-v9`, `strong_conclusion_v4`, `v10_llama`, `functional_projection_v10`).
- Document (README, fork_plan, RESEARCH_JOURNAL): each lens has a built-in failure mode (SVD tautological for low-rank adapters; layer-index tells depth not mechanism; module-family disagrees cross-adapter; native parameterization decompositions non-comparable). Mark analysis question on hold pending T4 multiseed: cross-adapter inconsistency may be N=1 seed noise.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 19:05:20 +08:00
wassname a3d999fd92 wip 2026-04-27 09:59:06 +08:00
wassname 651ad132d3 v7: cold-eyes evidence review + flag write-family-below-null in conclusion 2026-04-26 20:01:11 +08:00
wassname a1b38dc456 docs: add v6 hypothesis review (subagent + reviewer-of-reviewer) 2026-04-26 19:45:13 +08:00
wassname f0bce8be90 tidy 2026-04-25 19:27:53 +08:00