init

2026-06-27 18:41:42 +08:00 · 2026-04-04 23:40:34 +08:00
parent c687a68f93
commit fc46d878cf
19 changed files with 995 additions and 0 deletions
@@ -0,0 +1,205 @@
+<!-- FROZEN: only edit in meta-mode (META_MODE=1) -->
+<!-- Symlinked to AGENTS.md and CLAUDE.md -- always loaded by Claude/Cursor/etc. -->
+
+# Research Program
+
+**Project**: {FILL_IN: one sentence describing the research problem}
+
+**Metric**: {FILL_IN: what we optimize, e.g. val_bpb, accuracy, F1} (lower/higher is better)
+
+**Metric design requirements** (enforce before first real experiment):
+- Train + eval runs in 5-40 minutes on your GPU
+- Variance across seeds < effect size of a meaningful improvement (run baseline x3, check std)
+- Deterministic given same seed (fixed data order, fixed eval split)
+- If variance is too high: use more eval data, smaller model, or a proxy metric with less noise
+
+**Hypothesis space**: {FILL_IN: what class of approaches are in scope}
+
+Read `0_docs/problem.md` for full context.
+
+---
+
+## File Taxonomy
+
+| Type | Files | Rule |
+|------|-------|------|
+| FROZEN | `program.md`, `eval.py`, `meta_journal.md` | Never edit without `META_MODE=1` |
+| GLOBAL | `RESEARCH_JOURNAL.md`, `results.tsv` | Only commit from main; worktrees append to root copy |
+| APPEND-ONLY | `*_journal.md` | New entries at top, never edit old ones |
+| REGULAR | everything else | Modify freely in your worktree |
+
+---
+
+## Agent Algorithm
+
+```
+YOU ARE AN AGENT. Follow this loop:
+
+read RESEARCH_JOURNAL.md          # what has been tried
+read 0_docs/problem.md            # what we're solving
+
+n_ideas = count files in 1_ideas/ (not _TEMPLATE.md)
+
+if n_ideas < 30:
+    ## IDEATE
+    - Read at least one file from 0_docs/papers/ (or fetch a new paper)
+    - Do at least one web search for recent approaches
+    - Fetch papers: use /semantic-search or /exa-search skills
+      -> save FULL paper text to 0_docs/papers/{slug}.md (not summaries -- full text)
+      -> optionally add a vargdown-style argument map to 0_docs/papers/{slug}_analysis.argdown
+      -> add key insight (1-3 observations with sources) to RESEARCH_JOURNAL.md
+    - Brainstorm ideas. Quality bar:
+        * Novel (not in RESEARCH_JOURNAL.md already)
+        * Mechanistically grounded (not just hyperparameter tuning)
+        * Not sklearn slop -- must be a real ML research contribution
+        * Bold enough that it could be a paper contribution
+    - For each idea:
+        write 1_ideas/{YYYY-MM-DD}_{slug}.md  (use _TEMPLATE.md format)
+        spawn subagent to critique the idea (prompt: "Is this idea sound?
+          What are the failure modes? Is the hypothesis testable?")
+        append subagent feedback to the idea file
+    - Append summary of new ideas + paper insights to RESEARCH_JOURNAL.md
+
+else:
+    ## IMPLEMENT
+    pick the best idea from 1_ideas/ based on:
+        - subagent rating (see feedback section in idea file)
+        - novelty relative to RESEARCH_JOURNAL.md
+        - expected impact on metric
+        - implementation feasibility
+
+    slug = idea filename slug
+    run: git worktree add 5_worktrees/{slug} -b exp/{slug}
+    cd 5_worktrees/{slug}
+
+    implement the idea (modify train.py, model.py, etc.)
+    do NOT modify: eval.py, program.md, meta_journal.md
+
+    ## TEST
+    spawn subagent: "Code review this against the idea doc 1_ideas/{slug}.md.
+      Does the implementation match the hypothesis? Any bugs?"
+    run: just smoke                    # fast sanity check
+    run: just eval                     # appends to results.tsv
+
+    ## REPORT
+    write 9_reports/{YYYY-MM-DD}_{slug}.md  (use _TEMPLATE.md format)
+    append short summary to RESEARCH_JOURNAL.md:
+        - what was tried, what metric changed, what you learned
+        - key observation vs inference distinction
+
+    ## SUBMIT
+    git commit -m "exp({slug}): {one-line description}"
+    git push origin exp/{slug}
+    if result beats best in results.tsv:
+        create PR for human to merge
+
+## QUEUING EXPERIMENTS (pueue)
+
+Use pueue to queue experiments for the single GPU -- one at a time, no collision:
+
+    # Queue with a label showing the question and expected resolution
+    pueue add --label "Q: does X help? H: expect +0.05 metric" -- just eval --config=path
+
+    # Check queue / status / logs
+    pueue status
+    pueue log {task_id}       # full stdout
+    pueue follow {task_id}    # live tail
+
+Labels encode the hypothesis being tested. After the run, append observed vs expected
+to RESEARCH_JOURNAL.md. The label shows up in `pueue status` so you can track what
+question each running/queued job is answering.
+
+    # Example: multiple experiments queued with different hypotheses
+    pueue add --label "Q: rotary vs sinusoidal? H: rotary saves 0.1 bpb" -- just eval rotary
+    pueue add --label "Q: flash-attn memory? H: 2x batch size same speed" -- just eval flash
+    pueue add --label "Q: does layer norm placement matter? H: pre-norm better" -- just eval prenorm
+```
+
+---
+
+## Coding Conventions
+
+Fail fast. No defensive programming. No silent fallbacks.
+
+```python
+# shape ops: einops for clarity
+from einops import rearrange, reduce
+x = rearrange(x, 'b s h d -> b h s d')
+
+# einsum for explicit contraction
+out = torch.einsum('b h s d, b h d v -> b h s v', q, k)
+
+# jaxtyping on function boundaries (docs + smoke-test checking)
+from jaxtyping import Float
+from torch import Tensor
+def encode(x: Float[Tensor, 'b s d']) -> Float[Tensor, 'b s h']:
+    ...
+
+# logging: loguru not print
+from loguru import logger
+logger.info(f"loss={loss:.4f}")
+
+# dataframes: polars v1
+import polars as pl
+df.group_by("exp").agg(pl.col("metric").mean())
+
+# config: tyro dataclass
+import tyro
+from dataclasses import dataclass
+
+@dataclass
+class Config:
+    lr: float = 3e-4
+    # {FILL_IN}
+
+cfg = tyro.cli(Config)
+```
+
+---
+
+## Research Epistemics
+
+Separate observations from inferences:
+- **Observation**: "val_bpb dropped from 3.2 to 2.9 on run X" (measured fact)
+- **Inference**: "this suggests the attention head is learning positional structure" (interpretation)
+- **Claim from paper**: "authors claim X" -- not "X is true" unless you verified it
+
+For complex arguments, use `/vargdown` skill: verified argument maps with credences.
+
+Trust signals: community adoption > papers citing it > open source code > author reputation.
+
+---
+
+## Available Skills
+
+Assume installed at `~/.claude/skills/` (from https://github.com/wassname/skills):
+
+| Skill | Use for |
+|-------|---------|
+| `/semantic-search` | Search arXiv, Semantic Scholar, DBLP, OpenAlex |
+| `/arxiv-fetch` | Download full paper text given arXiv ID/URL |
+| `/exa-search` | Neural web search for recent approaches |
+| `/vargdown` | Verified argument maps with credences for complex reasoning |
+| `/gsd` | Get Shit Done: spec -> implement -> test -> review -> wrap |
+| `/jaxtyping` | Runtime tensor shape/dtype checking |
+| `/justfile` | Project recipes (`just smoke`, `just eval`, `just queue`) |
+| `/ml_debug` | ML convergence, gradient analysis, sweep methodology |
+| `/brainstorm` | Wide + deep ideation without tunnel vision |
+| `/external-review` | Code/plan review via a different model |
+| `pueue` | Queue GPU jobs sequentially; label each with Q/hypothesis |
+
+Also available: bibtex MCP (search_reference, fetch), wandb MCP (query runs).
+
+---
+
+## Meta-Mode
+
+Human sets `META_MODE=1` to enable editing of FROZEN files and committing to main.
+
+Use meta-mode to:
+- Revise this program.md (agent instructions)
+- Update eval.py (e.g., add new metric columns)
+- Reflect on the overall research process in meta_journal.md
+- Exit-interview style: what worked, what didn't, what would you change?
+
+To enter: human writes `META_MODE=1` in human_journal.md entry before asking agent.