2026-04-04 11:31:20

I'd like to turn this into a template repo for autoresearch

first read the skills for

justfile pueue uv jaxtyping skill and gsd skill, vargdown

assume these are available and can be described in one or two lines then references. We assume that https://github.com/wassname/skills is installed to the machine.

First principles

some files a SINGLETONS, they are only in the main branch, and protected by git hook reminders, and or gitignore
some are APPEND ONLY like journals
some are write only by automatic (eval.py), meta-agents (meta_journal.md, etc), agents (research journal), humans (human journal)

then read my best practice guide here... oh I can't find it... highlight from memory

it's like https://github.com/karpathy/autoresearch/blob/master/program.md or https://github.com/NousResearch/autonovel/blob/master/program.md read these for context and outline (but be aware the 1st is simple, and the 2nd is for writing not ml expeiments like here)
but we add a lesson learned and gotcha section
we have a FROZEN / META files, with a comment at the top saying "only edit in meta mode" to the top of these files:
- eval.py - this is frozen to stop 1) cheating 2) make sure results are compariable. Anything needed to stop p-hacking, seed hacking, changing the eval set, etc. While anything the agent can and should improve and compare should be in non frozen files
- meta_journal.md -
- program.md
  - this is symlinked to AGENTS.md and CLAUDE.md, to make sure it's loded in
  - but it's frozen to stop the agent from changing its own instructions. Unless in metamode!
  - Normal non-meta agents are encourages to add meta-feedback to the research log.
  - This should give agents a clear pseudo algo code of how to work
    - If the ideas queue if <30, think for hours about wide and deep next steps, based on read at least one main files, and at least one paper, and one web search. Ideas should not by hyperparameter tuning, or sklearn slop. They should be novel, and bold, and you should be able to show an independant subagent, and it says it makes sense. should we in 2_ideas/{ts}_{slug}.md have a log at end for subagent feedback and log. This will get turned into lab reports, and we want to seperate observations from inferences and guesses.
    - If the ideas queue is >30, pick the best one, and do it in a new worktree
    - When ready to test 1) have a subagent do a code review against the idea 2) do a smoke test 3) run eval.py
    - When finished improving write a short self contained lab report into 9_lab_reports/ with
      - metadata: date, commit, worktree, agent name,
      - intro: context, hypothesis,
      - experiment, observations, results
      - diagnosis: be very careful here to not overclaim, 95% of the time in ML it failed due to bug, poor engineering, or a misconception.
      - limitations, future work
    - Then append a short summary to the research journal with learnings.
we have GLOBAL files that are 0) only in the root projected directory 1) untracked, gitignored 2) have git hooks to stop them being commited. This so that they don't get commited to worktrees, then overwrite the main one during merge.
- These are RESEARCH_JOURNAL.md - append only file that agents in all worktrees should write to. All entried should have date, commit, worktree, agent name. First line in a comment saying it's append only, and showing format
- results.tsv - automatically appended by eval.py, early colums have short values, later columns have long values, and the file is git-ignored.
our main is protected by a git hook remining models to work in a worktree, and not commit directly to main (unless human has put them in meta-mode)
meta-mode, this is where the agent can take a step back, think about the overall workflow, perhaps do an exist interview. Here we improve the overall process by editing the FROZEN / META files, commiting to main, and appending to the meta-journal.
Agents should work in worktrees. They shouldbe commited and pushed. If it's a better result than any in results.tsv make it a PR for the human to merge.

Generally we want to show not tell, and sometimes show and tell, but never tell but not show. That means that these files should show throught specific examples

Thinking: We need expanded files for some of the stages, in particular ideation, brainstroming/diagnoses are hard for current LLM's, and can fill up context of just fall to tunnel vision.

Research: To help with ideation we have a 0_docs folder with a problem intro, and papers. Whenever an agent fetches a paper it should write the full markdown summary to 0_docs/papers/{slug}.md, and then add the main insights to the research journal. This way we have a growing set of knowledge that agents can draw on, and we can also track which papers were read when, and what was learned from them. We should marshall available skills, tools, and mcps for searching and fetching and priortise them so we can 1) search 2) download full text to files, and 3) have subagents with good epistemics, the vargdown skills, and knowledge of the problems and what we are looking for write summaries.

5.1 KiB Raw Blame History

2026-04-04 11:31:20

5.1 KiB

Raw Blame History