autoresearch_template/9_reports/_TEMPLATE.md

# Lab Report: {slug}

<!-- Copy to: 9_reports/YYYY-MM-DD_{slug}.md -->
<!-- Self-contained: a reader should understand the experiment without reading anything else -->

## Metadata

| Field | Value |
|-------|-------|
| Date | YYYY-MM-DD |
| Commit | SHORT_SHA |
| Branch | exp/{slug} |
| Worktree | 5_worktrees/{slug} |
| Agent | {name} |
| Idea doc | [1_ideas/YYYY-MM-DD_{slug}.md](../1_ideas/YYYY-MM-DD_{slug}.md) |

## Context

{1-2 sentences: what problem are we solving and why does this experiment matter}

## Hypothesis

**Question**: What happens if we {change}?

**Prediction**: {metric} improves by {amount} because {mechanism}.

## Experiment

What was changed vs the baseline:

```diff
# key diff -- just the essential change, not the whole file
```

Baseline: commit {SHORT_SHA} (or "untrained")

## Observations

Measured facts only -- no interpretation here.

| Run | Metric | Value | Delta vs baseline |
|-----|--------|-------|-------------------|
| baseline | {metric} | {value} | -- |
| this exp | {metric} | {value} | {+/-delta} |

```
# pueue log output or relevant stdout snippet
```

## Diagnosis

> Caution: 95% of ML failures are bugs, engineering issues, or misconceptions -- not deep theory.
> State credences explicitly. Don't overclaim.

**Most likely explanation** (credence: X%): {explanation}

**Alternative explanations**:
- {alternative} (credence: Y%)

**What would distinguish these**: {test or observation that would separate them}

## Limitations

- {what this experiment doesn't test}
- {confounds}

## Future work

- {most promising next step given this result}
- {if confirmed: what's the natural follow-up?}
- {if refuted: what's the diagnosis and what would we try instead?}