Files
evil_MoE/docs/writeup
wassname ec00bc4383 docs: A5 leak is double-hacks (not detector FP); placebo non-directionality measured via hkgap
Two review questions today exposed imprecise framing in load-bearing comments:

- A5 held-out <=1.1% hacked_E is the model double-hacking (one run_tests()-shaped
  completion that also writes the stdout marker, verified job-95 id 132), not a
  detector false positive. hacked_E is the mode-agnostic run_tests signature.
  Grading channels are non-overlapping; the model's strategy is not.
- Placebo 'non-directional' is now the hkgap measurement: real-v route2 hkgap
  0.6-0.8 (separates hack/clean), placebo ~0 (dead), both deploy hack 0.000.
  Confirms the degenerate-gate read (H2) over clever-random-direction (H1):
  suppression is quarantine-volume + exploration floor, not v_hack specificity.
  Direction only shows in solve (real 0.625 > placebo 0.531).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-05 08:23:49 +00:00
..