8 Commits

Author SHA1 Message Date
wassname 8f39c4a69f docs: rewrite Evil MoE spec to the soft-routing design + literature evidence
Spec was stale (recommended hard sparse "Version A", the DEMix absorption-killer).
Rewrite to match what is implemented and what we clarified:

- pseudocode-first: lora2r 2-expert forward, seeded rank-1 cosine router, GRPO+pin
  loop, deploy ablation. For 2 experts the "proper" router IS rank-1 (softmax over 2
  = sigmoid of one direction), seeded with v_act.
- "Why soft, not top-k" reframed as a tradeoff, not a verdict: hard routing closes
  the leak but needs a router that catches all hacks; soft keeps absorption available
  but leaks (1-w). DEMix only bites if we rely on absorption.
- Evidence section from two literature searches. Forced localization has working
  precedents (single bad direction: emergent misalignment/persona/refusal; behavioural
  expert seeding: SteerMoE, geometric cosine routing, cluster-aware upcycling; ablation
  + repair: NAEE/MoE-Pruner; router anchor: SEUF/MoTE). Emergent localization does not
  (standing-committee, topic-driven routing). So seed+pin are load-bearing.
- 3-way/3-expert noted as an extension (closer to production), 2 experts for the
  decisive causal run.

README: add Router dynamics (three forces, one pin-vs-reward conflict, mitigations).
Add HF "MoE in Transformers" blog to docs/papers (force-added past the docs gitignore).

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-14 13:06:38 +08:00
wassname cca7150ea0 tidy 2026-06-14 11:05:54 +08:00
wassname d4998a71ba docs: merge Ariahw Fig-5 table into the paper md (delete standalone); add abs-scale arrow plot
- Transcribed Fig-5 numeric table now lives inline in the paper md as an
  EDITOR'S TABLE comment, deleting docs/papers/ariahw_results_table_extracted.md
  (one fewer repo file; the table sits next to the figure it transcribes).
- floor_ceiling_abs.{png,pdf}: raw-rate variant. Arrows climb from the floor
  anchor; grey bedrock = worse-than-floor, blue sky = past-ceiling; hack axis
  reversed so right=better on both panels.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-09 12:35:14 +00:00
wassname 028b8fff68 transcribe Ariahw Fig 5 to a saved table; plot real no-oracle peer (LLM judge)
Read the figure PNGs directly (Fig 5 is a full numeric table the paper never
prints as text). Saved to docs/papers/ariahw_results_table_extracted.md so we
stop re-OCRing. Key correction: my 'LLM judge has no clean rate' was wrong --
LLM-judge PENALTY = 0.1% hack / 16.2% perf, NO oracle. So no-oracle suppression
is not routeV's novelty (the judge does it); the mechanism is (no live monitor,
gradient-level, fixed authored-pair direction). Plot now shows the judge as a
blue no-oracle peer bar alongside the grey oracle methods.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-09 11:48:02 +00:00
wassname 7195d19f90 docs 2026-06-07 03:07:35 +00:00
wassname efdf86a0cb wip 2026-05-30 04:33:33 +00:00
wassname c1f8ca4e7b tidy 2026-05-29 06:29:43 +00:00
wassname 120400c5f5 setup 2026-05-23 10:40:02 +08:00