Files
evil_MoE/docs/spec/20260614_moe_absorption_search.md
T
2026-06-14 09:28:16 +08:00

3.8 KiB

MoE sparsity ideas for increasing gradient routing absorption

Goal

Understand absorption and leakage in Gradient Routing / SGTM from the local grad-routing papers, then search for modern MoE specialization and routing mechanisms that might transfer to gradient routing to increase absorption.

Scope

In: local paper reading, local-first literature/code search, quote-anchored evidence, transfer judgment. Out: code changes, experiments, implementation.

Requirements

  • R1: Capture how Gradient Routing / SGTM define or explain absorption, leakage, and specialization. Done means: verbatim quotes with context from local papers. VERIFY: note contains source-attributed quotes from docs/papers/grad_routing/ on absorption/leakage.
  • R2: Capture modern MoE techniques that encourage expert separation, sparse routing, or lower overlap. Done means: verbatim quotes with context from papers/code/docs. VERIFY: note contains source-attributed quotes describing the mechanism, not paraphrase.
  • R3: Judge whether each MoE mechanism plausibly transfers to increase absorption in gradient routing. Done means: each candidate has yes/maybe/no plus mechanism-level reason tied back to R1/R2 quotes. VERIFY: every judgment cites both a gradient-routing quote and an MoE quote.

Tasks

  • T1 (R1): Read SGTM and Gradient Routing papers.
    • verify: rg -n "absorption|leakage|specialization|gradient norms|self-reinforcing" docs/papers/grad_routing/*.md
    • success: local quotes identify the claimed mechanism and limits.
    • likely_fail: quote lacks left/right context or is not verbatim.
    • sneaky_fail: we use quotes about unlearning/localization generally, not absorption specifically.
    • UAT: "when I open the note, I can read the exact paper text on absorption/leakage"
  • T2 (R2): Fan out local-first search subagents for MoE separation/routing methods.
    • verify: subagent outputs contain varglight-format quotes with source + epistemic note.
    • success: hits mention concrete mechanism like aux loss, balancing, entropy, top-k, capacity, noise, or assignment.
    • likely_fail: generic MoE summaries with no verbatim quotes.
    • sneaky_fail: sources are all downstream summaries of one paper.
    • UAT: "when I inspect the collected hits, each one is a copy-pasteable quote with source"
  • T3 (R3): Deduplicate and write a mapped judgment note.
    • verify: note lists candidates with yes/maybe/no and cites quote blocks.
    • success: transfer judgments are mechanism-level and concise.
    • likely_fail: unsupported brainstorm list.
    • sneaky_fail: we recommend methods that optimize a different failure mode than absorption.
    • UAT: "when I read the final note, I can see which MoE tricks are worth trying and why"

Context

  • User wants varglight format for every subagent hit.
  • Local-first search priority: qmd, local-search, gh, lesswrong, arxiv, semantic-search, then web fallback if thin.
  • Budget per subagent: about 6 tool calls, one round per tool, then return PARTIAL.

Log

  • 2026-06-14: Loaded varglight skill. It requires verbatim quotes with surrounding context, source attribution, and one-line epistemic context; no paraphrase inside quote blocks.
  • 2026-06-14: Parallel subagent fan-out returned useful arXiv, GitHub, local-search, LessWrong, and semantic-search hits. qmd timed out twice under the time budget, so local-first coverage is good but not exhaustive.
  • 2026-06-14: Wrote consolidated note to docs/spec/20260614_moe_absorption_results.md and ran a fresh-eyes reviewer subagent. Review said the main overreach was claiming fine-grained segmentation helps absorption directly; toned this down to a MAYBE specialization transfer.

TODO

  • If promising candidates emerge, design a follow-up experiment spec.

Errors

Task Error Resolution