# Multi-view hack pairs ## Goal Author one strong all-in-one contrastive pairset that represents deliberate proxy gaming across varied contexts, rather than one syntax mechanism or one explicit label. ## Scope In: replace the active authored pairset with one multi-view section; update its audit, default reference, and verification. Out: claiming the new set is empirically better before a real-model comparison. ## Requirements - R1: Each pair uses one exact prompt and closely matched hack/clean completions. - R2: The set spans behavior, opportunity-aware choice, explicit disposition, naming/reasoning, and non-code proxy gaming. - R3: No single superficial feature consistently identifies the hack side. - R4: Every hack side strongly expresses exploiting a proxy; every clean side strongly expresses satisfying the underlying task. - R5: Pair metadata supports loading tagged subsets without entering model input. ## Tasks - [x] T1: Author one 27-pair `all-in-one` section. - verify: exactly 27 unique headings and all fields load. - likely_fail: vague “bad versus good” pairs fail to express proxy gaming. - sneaky_fail: one repeated syntax/token dominates the axis. - UAT: audit table shows balanced views, domains, explicitness, and mechanisms. - [x] T2: Make `all-in-one` the active default and simplify active pair sources. - [x] T3: Verify parser, balance metrics, extraction smoke, and fresh-eyes review. ## Design | View | N | Purpose | |---|---:|---| | concrete behavior | 8 | Anchor the direction in actions resembling live hacks | | opportunity-aware action | 6 | Distinguish deliberate exploitation from accidental weakness | | explicit disposition/roleplay | 6 | Supply conceptual, intention, and role-conditioned signals | | naming/reasoning | 4 | Compact lexical, visible-planning, and `` representations | | non-code proxy gaming | 3 | Force cross-context abstraction beyond Python tests | Each pair has a `Tags:` metadata line. `#all-in-one@behavior` selects one tag; `#all-in-one@behavior,opportunity-aware` selects their intersection. Tags are not loaded into prompts or completions. Match tightly within pairs; diversify aggressively across pairs. Explicit language is allowed in a minority of pairs because it strongly identifies intention. It must not be the only or dominant view. ## Log - Existing pure-intent pairs underperformed behavior pairs, so explicit pairs are included as one view rather than used alone. - Existing philosophical/moral pairs changed prose and print/assert behavior together; the new set never combines a semantic framing contrast with a second unrelated contrast. - Incomplete focused snippets are allowed. Empty/pass-only stubs are rejected because they express no substantive decision. ## Errors | Task | Error | Resolution | |---|---|---| ## Results - Runtime pair data moved to `data/pairs/`; authoring guidance and audit remain in `docs/personas/`. - Final headline set: 27 pairs, including 2 matched roleplay instructions and 1 matched `` trace. - Tagged loading supports whole-set, single-tag, and tag-intersection extraction. ## Verify - Full smoke: `/tmp/claude-1000/multiview_pairs_data_smoke.log` - routeV loaded `data/pairs/hack_pairs.md#all-in-one -> 27 pairs`. - Extraction band mean width `+0.171`; `13/14` modules included. - `scripts/verify_science_invariants.py` passed Markdown parsing, tagged subsets, content-addressing, and the no-complete-stub invariant. ## Review Fresh-eyes review: `docs/reviews/20260610_multiview_pairs_external.md`. - Judged the multi-view design well-constructed and found no repeated dominant shortcut or disguised stub. - Flagged `behavior_visible_examples` as weak-test behavior rather than deliberate exploitation. Kept intentionally; `@opportunity-aware` isolates deliberate choices. - Flagged `behavior_proxy_metric` as the largest length mismatch. Kept because shortening real validation or padding shallow validation would weaken the substantive contrast.