mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 16:45:42 +08:00
journal(e): super-S-space and act t-stat both null; act_dot mean-extraction stays the routeA default
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -4227,3 +4227,33 @@ Provenance:
|
||||
**Discussion (speculative).** I read the act/resid cluster as one finding, not four scores: completion-mean activations against a hand-pair direction separate positively-reinforced hacks from solves at 0.70-0.92, and the choice among {act, resid} x {cos, dot} is within noise, so it should be made structurally. Wassname prefers dot (cosine inflates small-norm rollouts); resid avoids module pooling entirely and is adapter-independent. The remaining gap to a high-precision router is the score's separability, not threshold placement: the Otsu cut now sits close to the oracle split in most panels, yet precision at useful recall stays ~0.5-0.65 on v4/v5. I do not have a supported explanation for the within-window hack-score spread; step drift is the largest measured component and the mode story is dead. Alternative for the t-stat null on act: with 8 pairs the per-coordinate std is itself noise (n=8 std has ~25% relative error), so the t-weighting may be real but unestimable at this pair count.
|
||||
|
||||
**Next.** Wire the act-based gate (routeA) into train.py per docs/spec/20260611_act_gate_spec.md, updated for: dot score, t-stat extraction (clamped, std over pairs), online z-norm via EMA mean/std, winsorized-Otsu pinning. More authored pairs is the highest-leverage data change (t-weighting and module weighting both starve at n=8).
|
||||
|
||||
## 2026-06-11 (e) -- super-S-space gate score and act t-stat extraction: both null
|
||||
|
||||
**Introduction.** Two candidate score improvements tested before freezing the routeA design (see entry (d)). First, wassname's super-S-space idea (wassname/steering-lite, variants/super_sspace.py): project the residual stream onto the pooled eigenbasis of the residual writers and readers before extracting the pair-difference vector and scoring. The eigenbasis is orthogonal, so the full-rank unwhitened transform is a pure rotation that leaves cos and dot unchanged; the testable content is (a) whitening by the pooled singular spectrum and (b) top-r mode selection by |dS| of the authored-pair difference. Second, act_dot with t-stat extraction (entry (d) measured the t-stat variant for cos only). Expectation: uncertain for super-S; mild gain expected for t-stat given the resid result in (d).
|
||||
|
||||
**Methods.** Commit `1d4f33f`, scripts/diag_pinning_superS.py, fully offline on CPU: Gram matrices accumulated as W W^T for writers (o_proj, down_proj) and W^T W for readers (q/k/v/gate/up_proj) from the Qwen3-4B safetensors, one eigh per basis; features from the cached out/diag{,_v4,_v5}/pinning_feats.pt (same v3/v4/v5 windows, behavior_ pairs, and A>0 contrast as entry (d)). Grid: role {writer, reader, both} x pooled blocks {layers 12/18/24, all 36} x {whitened, unwhitened rotation} x r {full, 256, 64} x {cos, dot}. The t-stat check is a 6-row session run on the act features (v = mean(D) / se(D) over the 8 pairs, clamped to |t| <= 3, vs plain mean extraction). No pueue tasks; no GPU.
|
||||
|
||||
**Results.**
|
||||
|
||||
| variant | score | v3 | v4 | v5 | mean | min |
|
||||
|---|---|---|---|---|---|---|
|
||||
| superS-rot reader/all r=64 (best of grid) | cos | 0.910 | 0.746 | 0.740 | 0.799 | 0.740 |
|
||||
| raw resid (baseline) | dot | 0.905 | 0.721 | 0.756 | 0.794 | 0.721 |
|
||||
| superS reader/all r=256 (best whitened) | cos | 0.905 | 0.720 | 0.717 | 0.781 | 0.717 |
|
||||
| raw resid (baseline) | cos | 0.916 | 0.700 | 0.804 | 0.807 | 0.700 |
|
||||
| act, mean extraction (entry (d) reference) | dot | 0.870 | 0.747 | 0.747 | 0.788 | 0.747 |
|
||||
| act, t_clamp3 extraction | dot | 0.870 | 0.752 | 0.714 | 0.779 | 0.714 |
|
||||
| act, t_clamp3 extraction | cos | 0.867 | 0.756 | 0.719 | 0.781 | 0.719 |
|
||||
|
||||
Table 1. AUROC on the A>0 contrast per emergence window. Top block: best super-S rows from the 51-row grid against the raw resid baselines. The best row (unwhitened rotation, reader basis, top-64 modes) reaches worst-window 0.740, above raw resid cos (0.700) but below the act default (0.747), and it is the maximum over ~50 rows, so post-hoc selection inflates it. Whitened variants never exceed raw resid dot (0.721); whitened r=64 degrades to 0.58-0.65. Bottom block: t-stat extraction on act loses 0.033-0.038 on v5 and gains at most 0.009 on v4, worst-window 0.714-0.719 vs 0.747.
|
||||
|
||||
Provenance:
|
||||
- Harness: scripts/diag_pinning_superS.py at commit `1d4f33f`; feature caches and contrast identical to entry (d).
|
||||
- Super-S grid: /tmp/claude-1000/superS_v2.log (full 51-row table; superS_v1.log is the earlier run without the rotation variants). The baseline cos row reproduces entry (d) Table 1 exactly (0.916/0.700/0.804), confirming this harness agrees with scripts/diag_pinning.py.
|
||||
- t-stat: /tmp/claude-1000/act_dot_tstat.log (6-row table; the mean/cos row reproduces 0.869/0.749/0.752).
|
||||
- Fresh-eyes review (agent a3848964835b851aa) verified the Gram identities, the rotation-invariance claim, the per-layer top-r gather, the cos denominator, and the Qwen3 writer/reader shape classification; its one gap (unwhitened rotation untested) was closed by the superS-rot rows in superS_v2.log.
|
||||
|
||||
**Discussion (speculative).** I read both as nulls at this sample size (per-window SE ~0.07). For super-S, whitening consistently sits at or below the raw baseline, which would fit the pooled spectrum amplifying low-energy directions that carry no hack signal here, but I cannot distinguish that from noise. The one apparent gain (rotation + reader basis + top-64) is exactly what taking the maximum of 50 noisy rows produces; I would only believe it if it survived on new windows chosen in advance. For t-stat the entry-(d) alternative stands: the per-coordinate std over 8 pairs is itself ~25% noise, so the weighting may be real but unestimable at this pair count. Both nulls leave act_dot with plain mean extraction as the routeA default.
|
||||
|
||||
**Next.** routeA implementation per the plan now written into docs/spec/20260611_act_gate_spec.md (extraction module with verify gate, gate wiring replacing routeV, rolling-buffer winsorized-Otsu pinning), pending wassname's approval. More authored pairs remains the highest-leverage data change.
|
||||
|
||||
Reference in New Issue
Block a user