Commit Graph

7 Commits

Author SHA1 Message Date
wassname e5b68acf69 diag(#40): z-norm scores within family, winsorized 2-threshold Otsu zones, --feats offline mode
Synthetic and live scores share an axis with meaningful zero (raw scores carry a
common <mu,v> offset since v = mean diff is not orthogonal to the family mean).
Zones come from label-free online stats (EMA mean/std + Otsu valley), replacing
mean+k*sd which placed both cuts beyond every distribution. Winsorize at 1/99%
before Otsu: variance-maximizing cuts otherwise buy a class for one outlier.
Fresh-eyes review verified z-norm is affine (AUROCs unchanged), zones label-free,
P/R recompute from parquet matches titles.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-11 11:23:42 +00:00
wassname bbf6a97151 diag(#40): fix 3x2 plot -- keep bottom spine at axis edge (tick labels collided with rugs), skip labels on narrow zones
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-11 10:47:57 +00:00
wassname 0660e7bdd3 diag(#40): residual-stream rep (layers 12/18/24) as third gate-score candidate, 3x2 panel
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-11 10:45:50 +00:00
wassname 49ca51b6e1 diag(#40): review fixes -- A>0 contrast headline (adv-only baseline 0.90 vs-all / 0.58 A>0), headline=behavior_ training default, n at P@rout, rug strips, drop density ticks
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-11 10:22:08 +00:00
wassname 2f5f93f4fb diag(#40): Q2 rewrite -- {grad,act}x{cos,dot} scores vs exploited&A>0 label, 3-zone pinning, pairset table
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-11 10:13:06 +00:00
wassname 70697ff36e diag(#40): pinning plot splits solve/fail/hack + per-pairset AUROC ranking
Q4 fix: on-policy "solve" was ~exploited = solves+fails (mostly fails). Split by
gt_pass into solve/fail/hack (live: 103 hack / 27 solve / 62 fail). Per-pairset
ranking: build v_grad from each heading-prefix subset, re-project the SAME stored
live c-grads (no model re-run). Finding: behavior pairs AUROC 0.69 vs all-in-one
0.53; reasoning/opportunity anti-aligned (<0.5) -> mixing dilutes.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-11 06:16:27 +00:00
wassname 979daf84fd feat(#30): mean+k*std online gate replaces fixed quantile; always-show route cols
Gate calibration: route by live mean + route_std_mid/route_std_rout * std of the
pooled cosine-to-v_grad, not a fixed quantile tail. Self-silences -- only the tail
that genuinely exceeds the spread routes, so qmass tracks real separation instead
of a forced fraction. The authored absolute band is mis-placed (live pos sits far
below the synthetic-hack edge; even synthetic solve out-aligns on-policy hack).

tablelog: auroc/rout/routE/keep/resid/qmass cols always shown (nan on vanilla) so
arm tables line up.

Diagnostics: scripts/diag_pinning.py (4-population calibration view, mean+/-2sd band)
and scripts/diag_pinning_refresh.py (proves cosine stats recompute from a tracked
v-independent gradient cloud on a v_grad refresh -- exact for k=1, sanity 2.5e-16).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-11 02:56:07 +00:00