mirror of
https://github.com/wassname/evil_MoE.git
synced 2026-06-27 15:15:40 +08:00
3.8 KiB
3.8 KiB
MoE sparsity ideas for increasing gradient routing absorption
Goal
Understand absorption and leakage in Gradient Routing / SGTM from the local grad-routing papers, then search for modern MoE specialization and routing mechanisms that might transfer to gradient routing to increase absorption.
Scope
In: local paper reading, local-first literature/code search, quote-anchored evidence, transfer judgment. Out: code changes, experiments, implementation.
Requirements
- R1: Capture how Gradient Routing / SGTM define or explain absorption, leakage, and specialization. Done means: verbatim quotes with context from local papers. VERIFY: note contains source-attributed quotes from
docs/papers/grad_routing/on absorption/leakage. - R2: Capture modern MoE techniques that encourage expert separation, sparse routing, or lower overlap. Done means: verbatim quotes with context from papers/code/docs. VERIFY: note contains source-attributed quotes describing the mechanism, not paraphrase.
- R3: Judge whether each MoE mechanism plausibly transfers to increase absorption in gradient routing. Done means: each candidate has yes/maybe/no plus mechanism-level reason tied back to R1/R2 quotes. VERIFY: every judgment cites both a gradient-routing quote and an MoE quote.
Tasks
- T1 (R1): Read SGTM and Gradient Routing papers.
- verify:
rg -n "absorption|leakage|specialization|gradient norms|self-reinforcing" docs/papers/grad_routing/*.md - success: local quotes identify the claimed mechanism and limits.
- likely_fail: quote lacks left/right context or is not verbatim.
- sneaky_fail: we use quotes about unlearning/localization generally, not absorption specifically.
- UAT: "when I open the note, I can read the exact paper text on absorption/leakage"
- verify:
- T2 (R2): Fan out local-first search subagents for MoE separation/routing methods.
- verify: subagent outputs contain varglight-format quotes with source + epistemic note.
- success: hits mention concrete mechanism like aux loss, balancing, entropy, top-k, capacity, noise, or assignment.
- likely_fail: generic MoE summaries with no verbatim quotes.
- sneaky_fail: sources are all downstream summaries of one paper.
- UAT: "when I inspect the collected hits, each one is a copy-pasteable quote with source"
- T3 (R3): Deduplicate and write a mapped judgment note.
- verify: note lists candidates with yes/maybe/no and cites quote blocks.
- success: transfer judgments are mechanism-level and concise.
- likely_fail: unsupported brainstorm list.
- sneaky_fail: we recommend methods that optimize a different failure mode than absorption.
- UAT: "when I read the final note, I can see which MoE tricks are worth trying and why"
Context
- User wants varglight format for every subagent hit.
- Local-first search priority: qmd, local-search, gh, lesswrong, arxiv, semantic-search, then web fallback if thin.
- Budget per subagent: about 6 tool calls, one round per tool, then return PARTIAL.
Log
- 2026-06-14: Loaded
varglightskill. It requires verbatim quotes with surrounding context, source attribution, and one-line epistemic context; no paraphrase inside quote blocks. - 2026-06-14: Parallel subagent fan-out returned useful arXiv, GitHub, local-search, LessWrong, and semantic-search hits.
qmdtimed out twice under the time budget, so local-first coverage is good but not exhaustive. - 2026-06-14: Wrote consolidated note to
docs/spec/20260614_moe_absorption_results.mdand ran a fresh-eyes reviewer subagent. Review said the main overreach was claiming fine-grained segmentation helps absorption directly; toned this down to aMAYBEspecialization transfer.
TODO
- If promising candidates emerge, design a follow-up experiment spec.
Errors
| Task | Error | Resolution |
|---|