diff --git a/README.md b/README.md index 682228f..e66ed99 100644 --- a/README.md +++ b/README.md @@ -66,9 +66,8 @@ failure_modes: ``` - A goal is a `## Goal:` header with an ``, a `status:` - (`open` | `active` | `done` | `cancelled`), a falsifiable `done_when:` (what you expect, and the - symptom if it is NOT met), an optional `verify:` shell command, a `failure_modes:` pre-mortem - list, and `- [ ]` subtasks. + (`open` | `active` | `done` | `cancelled`), one falsifiable `done_when:`, an optional `verify:` + shell command, an optional short `failure_modes:` pre-mortem list, and `- [ ]` subtasks. - `done_when` names the evidence that distinguishes real success from a subtle failure. `verify`, when present, is the deterministic first stage of the sign-off check. - The agent ticks subtasks, appends to `## Log`, and sets `status` as it works. Multiple goals may diff --git a/src/plan-file.ts b/src/plan-file.ts index 5d6d45a..a6c8589 100644 --- a/src/plan-file.ts +++ b/src/plan-file.ts @@ -15,7 +15,7 @@ * ## Goal: * * status: open | active | done | cancelled - * done_when: + * done_when: * verify: * failure_modes: * - diff --git a/src/prompts.ts b/src/prompts.ts index 383660b..af8385b 100644 --- a/src/prompts.ts +++ b/src/prompts.ts @@ -35,30 +35,34 @@ * ──────────────────────────────────────────────────────────────────────── */ export const planDrafting = `\ You are in plan mode. Explore the repository read-only, then draft a plan into plan.md. -Do not write or run code in this phase. Produce goals the human will review and approve. +Do not write or run code in this phase. Produce a plan the human will review and approve. + +Right-size the plan — don't force structure that isn't there: +- Default to ONE goal. Add another only when it's a genuinely separate checkpoint you'd want + signed off on its own (its own done_when that can pass or fail independently). A long list of + near-identical goals should be one goal with subtasks. Most objectives are 1-2 goals. +- Subtasks are the steps inside a goal. Add them when a goal has 3+ distinct steps; skip them for + a single-action goal. Don't pad with trivial steps. +- Don't invent phases to look thorough. When in doubt, merge. Write each goal in this shape: ## Goal: status: open -done_when: -verify: +done_when: +verify: failure_modes: - - - - - - -- [ ] -- [ ] + - +- [ ] +- [ ] -Rules for a good plan: -- Keep goals small enough that done_when is checkable in one sitting. -- done_when must be falsifiable. "Works well" is not a criterion; "p95 < 50ms on bench-X, - else timeouts in load-test.log" is. -- failure_modes are a pre-mortem: the cheap, specific ways a later "done" could be wrong. - This is the highest-value part — it shapes what evidence you'll collect. -- Prefer a verify command. A green deterministic check is worth more than a paragraph of - description, and it's the first thing checked at sign-off. +Keep it lean: +- done_when is ONE concrete, checkable condition — not a paragraph, no "if wrong" clause. + The symptom of failure goes in failure_modes, not here. +- failure_modes: 0-2 terse items, only the non-obvious ways a "done" could be wrong (a + pre-mortem). If you add a verify command, one mode can be "verify passes on a gamed file". +- subtasks: a short checklist of the real steps; omit them if the goal is a single action. +- Prefer a verify command when success is a test/build/threshold — a green check beats prose. When the plan is drafted, present it and stop for review. Do not begin execution.`;