pi-plan: right-size plans (fewer goals), lean done_when/failure_modes

The drafting prompt over-decomposed: one goal per item, long run-on done_when (criterion + failure symptom in one line), and 3 mandatory failure_modes. Plans came out verbose and hard to read. - planDrafting: default to ONE goal; add another only for a genuinely separate checkpoint; near-identical items become subtasks. Subtasks only for 3+ step goals. Don't invent phases. (granularity heuristic adapted from tintinweb/pi-tasks when-to/when-not guidance) - done_when: one falsifiable check, no embedded "if wrong" clause (the failure symptom belongs in failure_modes) - failure_modes: 0-2 terse items, optional - Sync the stale done_when wording in README and plan-file.ts comment Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 15:16:18 +08:00 · 2026-06-15 20:28:02 +08:00
parent 158e04f4ac
commit 861b2ea157
3 changed files with 24 additions and 21 deletions
@@ -66,9 +66,8 @@ failure_modes:
 ```

 - A goal is a `## Goal:` header with an `<!-- id -->`, a `status:`
-  (`open` | `active` | `done` | `cancelled`), a falsifiable `done_when:` (what you expect, and the
-  symptom if it is NOT met), an optional `verify:` shell command, a `failure_modes:` pre-mortem
-  list, and `- [ ]` subtasks.
+  (`open` | `active` | `done` | `cancelled`), one falsifiable `done_when:`, an optional `verify:`
+  shell command, an optional short `failure_modes:` pre-mortem list, and `- [ ]` subtasks.
 - `done_when` names the evidence that distinguishes real success from a subtle failure. `verify`,
  when present, is the deterministic first stage of the sign-off check.
 - The agent ticks subtasks, appends to `## Log`, and sets `status` as it works. Multiple goals may
@@ -15,7 +15,7 @@
 *   ## Goal: <subject>
 *   <!-- id: <slug> -->
 *   status: open | active | done | cancelled
- *   done_when: <falsifiable check; plus the symptom if NOT met>
+ *   done_when: <one falsifiable check>
 *   verify: <shell command, optional>
 *   failure_modes:
 *     - <pre-mortem item>
@@ -35,30 +35,34 @@
 * ──────────────────────────────────────────────────────────────────────── */
 export const planDrafting = `\
 You are in plan mode. Explore the repository read-only, then draft a plan into plan.md.
-Do not write or run code in this phase. Produce goals the human will review and approve.
+Do not write or run code in this phase. Produce a plan the human will review and approve.
+
+Right-size the plan — don't force structure that isn't there:
+- Default to ONE goal. Add another only when it's a genuinely separate checkpoint you'd want
+  signed off on its own (its own done_when that can pass or fail independently). A long list of
+  near-identical goals should be one goal with subtasks. Most objectives are 1-2 goals.
+- Subtasks are the steps inside a goal. Add them when a goal has 3+ distinct steps; skip them for
+  a single-action goal. Don't pad with trivial steps.
+- Don't invent phases to look thorough. When in doubt, merge.

 Write each goal in this shape:

 ## Goal: <one short imperative line>
 status: open
-done_when: <a falsifiable check, plus the symptom you'd see if it's NOT met>
-verify: <a shell command that exits 0 only when the goal is met — include this whenever
-         success is expressible as tests/lint/build/a threshold; omit it otherwise>
+done_when: <one falsifiable check — what is true on disk when this is done>
+verify: <optional shell command that exits 0 only when done_when holds; omit if not testable>
 failure_modes:
-  - <a concrete way this could look done but isn't>
-  - <another>
-  - <if verify exists: "verify passes on a trivial or gamed test">
- [ ] <first subtask>
- [ ] <next subtask>
+  - <a sneaky way this could look done but isn't — terse, optional>
+- [ ] <subtask>
+- [ ] <subtask>

-Rules for a good plan:
- Keep goals small enough that done_when is checkable in one sitting.
- done_when must be falsifiable. "Works well" is not a criterion; "p95 < 50ms on bench-X,
-  else timeouts in load-test.log" is.
- failure_modes are a pre-mortem: the cheap, specific ways a later "done" could be wrong.
-  This is the highest-value part — it shapes what evidence you'll collect.
- Prefer a verify command. A green deterministic check is worth more than a paragraph of
-  description, and it's the first thing checked at sign-off.
+Keep it lean:
+- done_when is ONE concrete, checkable condition — not a paragraph, no "if wrong" clause.
+  The symptom of failure goes in failure_modes, not here.
+- failure_modes: 0-2 terse items, only the non-obvious ways a "done" could be wrong (a
+  pre-mortem). If you add a verify command, one mode can be "verify passes on a gamed file".
+- subtasks: a short checklist of the real steps; omit them if the goal is a single action.
+- Prefer a verify command when success is a test/build/threshold — a green check beats prose.

 When the plan is drafted, present it and stop for review. Do not begin execution.`;