pi-plan: right-size plans (fewer goals), lean done_when/failure_modes

The drafting prompt over-decomposed: one goal per item, long run-on
done_when (criterion + failure symptom in one line), and 3 mandatory
failure_modes. Plans came out verbose and hard to read.

- planDrafting: default to ONE goal; add another only for a genuinely
  separate checkpoint; near-identical items become subtasks. Subtasks
  only for 3+ step goals. Don't invent phases. (granularity heuristic
  adapted from tintinweb/pi-tasks when-to/when-not guidance)
- done_when: one falsifiable check, no embedded "if wrong" clause (the
  failure symptom belongs in failure_modes)
- failure_modes: 0-2 terse items, optional
- Sync the stale done_when wording in README and plan-file.ts comment

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-15 20:28:02 +08:00
parent 158e04f4ac
commit 861b2ea157
3 changed files with 24 additions and 21 deletions
+2 -3
View File
@@ -66,9 +66,8 @@ failure_modes:
```
- A goal is a `## Goal:` header with an `<!-- id -->`, a `status:`
(`open` | `active` | `done` | `cancelled`), a falsifiable `done_when:` (what you expect, and the
symptom if it is NOT met), an optional `verify:` shell command, a `failure_modes:` pre-mortem
list, and `- [ ]` subtasks.
(`open` | `active` | `done` | `cancelled`), one falsifiable `done_when:`, an optional `verify:`
shell command, an optional short `failure_modes:` pre-mortem list, and `- [ ]` subtasks.
- `done_when` names the evidence that distinguishes real success from a subtle failure. `verify`,
when present, is the deterministic first stage of the sign-off check.
- The agent ticks subtasks, appends to `## Log`, and sets `status` as it works. Multiple goals may
+1 -1
View File
@@ -15,7 +15,7 @@
* ## Goal: <subject>
* <!-- id: <slug> -->
* status: open | active | done | cancelled
* done_when: <falsifiable check; plus the symptom if NOT met>
* done_when: <one falsifiable check>
* verify: <shell command, optional>
* failure_modes:
* - <pre-mortem item>
+21 -17
View File
@@ -35,30 +35,34 @@
* ──────────────────────────────────────────────────────────────────────── */
export const planDrafting = `\
You are in plan mode. Explore the repository read-only, then draft a plan into plan.md.
Do not write or run code in this phase. Produce goals the human will review and approve.
Do not write or run code in this phase. Produce a plan the human will review and approve.
Right-size the plan — don't force structure that isn't there:
- Default to ONE goal. Add another only when it's a genuinely separate checkpoint you'd want
signed off on its own (its own done_when that can pass or fail independently). A long list of
near-identical goals should be one goal with subtasks. Most objectives are 1-2 goals.
- Subtasks are the steps inside a goal. Add them when a goal has 3+ distinct steps; skip them for
a single-action goal. Don't pad with trivial steps.
- Don't invent phases to look thorough. When in doubt, merge.
Write each goal in this shape:
## Goal: <one short imperative line>
status: open
done_when: <a falsifiable check, plus the symptom you'd see if it's NOT met>
verify: <a shell command that exits 0 only when the goal is met — include this whenever
success is expressible as tests/lint/build/a threshold; omit it otherwise>
done_when: <one falsifiable check — what is true on disk when this is done>
verify: <optional shell command that exits 0 only when done_when holds; omit if not testable>
failure_modes:
- <a concrete way this could look done but isn't>
- <another>
- <if verify exists: "verify passes on a trivial or gamed test">
- [ ] <first subtask>
- [ ] <next subtask>
- <a sneaky way this could look done but isn't — terse, optional>
- [ ] <subtask>
- [ ] <subtask>
Rules for a good plan:
- Keep goals small enough that done_when is checkable in one sitting.
- done_when must be falsifiable. "Works well" is not a criterion; "p95 < 50ms on bench-X,
else timeouts in load-test.log" is.
- failure_modes are a pre-mortem: the cheap, specific ways a later "done" could be wrong.
This is the highest-value part — it shapes what evidence you'll collect.
- Prefer a verify command. A green deterministic check is worth more than a paragraph of
description, and it's the first thing checked at sign-off.
Keep it lean:
- done_when is ONE concrete, checkable condition — not a paragraph, no "if wrong" clause.
The symptom of failure goes in failure_modes, not here.
- failure_modes: 0-2 terse items, only the non-obvious ways a "done" could be wrong (a
pre-mortem). If you add a verify command, one mode can be "verify passes on a gamed file".
- subtasks: a short checklist of the real steps; omit them if the goal is a single action.
- Prefer a verify command when success is a test/build/threshold — a green check beats prose.
When the plan is drafted, present it and stop for review. Do not begin execution.`;