mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 23:22:18 +08:00
a3a3f09824
Haar's ~0 cos is concentration of measure (out-of-subspace), not a cleaner placebo. Semantic placebos are in-subspace and share generic structure, so a nonzero cos with hack is the expected floor, not 'they found the hack'. null_city's high-cos modules are plausibly low-rank-module artifacts. Cosine is correlational; the ablation run is the causal test. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
attic
Parked, not deleted. Superseded exploration kept only because the persona-pair methodology may get cited in the paper appendix.
make_pairsets.py,make_dataset_pairsets.py— persona contrastive-pair authoring (tasks #123-126, done). The live extraction path ispairs.PAIRS(hand pairs) orpairs_from_pool(pool-derived). No justfile recipe builds these anymore.