pi-goals: move CompleteGoal desc into prompts.ts; trim README

The tool description and param doc are model-facing, so they belong in
prompts.ts with the rest. Add them as step 6 (completeGoalTool) and
renumber the evidence judge to 7; prompts.ts is now ordered the way the
agent meets each text, so it reads as one pass.

The moved desc also carries the positive-success framing: evidence must
show the success happened, not just that a failure was avoided.

README trimmed (saying less, voice unchanged): tighter intro and
comparison, less prose around the examples and sign-off steps. Humanizer
lint clean.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-16 11:50:12 +08:00
parent 838c42d7bd
commit 0a1503dc04
3 changed files with 68 additions and 50 deletions
+27 -33
View File
@@ -2,15 +2,14 @@
Plan mode for agreeing on goals before any code gets written. Each goal names the subtle failure mode Plan mode for agreeing on goals before any code gets written. Each goal names the subtle failure mode
that could fake a "done" and the discriminator that tells real success from it, plus subtasks and the that could fake a "done" and the discriminator that tells real success from it, plus subtasks and the
evidence that gets checked at sign-off. It all lives in one markdown file you can read and print. A evidence checked at sign-off. It lives in one markdown file. A widget keeps the goals in front of you
widget keeps the goals in front of you through compaction, a reminder nudges the agent to keep the through compaction, a reminder nudges the agent to keep the file current, and a goal is signed off
file current and work toward the goals on its own, and a goal is signed off only after a read-only only after a read-only subagent checks its evidence.
subagent has checked its evidence.
It guides rather than guards. Like [pi-milestones](https://github.com/Neuron-Mr-White/UniPi/tree/main/packages/milestone) Like [pi-milestones](https://github.com/Neuron-Mr-White/UniPi/tree/main/packages/milestone) and
and [burneikis/pi-plan](https://github.com/burneikis/pi-plan), it leans on a form and a process to [burneikis/pi-plan](https://github.com/burneikis/pi-plan), it guides rather than guards: a form and a
steer the agent and trust its judgement. [pi-lgtm](https://github.com/wassname/pi-lgtm) was my earlier process the agent follows. [pi-lgtm](https://github.com/wassname/pi-lgtm) was my earlier, more complex
attempt and got too complex; this one stays small and maintainable. attempt.
## Install ## Install
@@ -46,14 +45,12 @@ model for the sign-off judge (the default is your current model).
## Example ## Example
Start plan mode with an optional seed:
``` ```
/goals audit the papers dir metadata and clean up empty dirs /goals audit the papers dir metadata and clean up empty dirs
``` ```
The agent explores read-only, then drafts the goal with a subtle failure mode and the discriminator The agent explores read-only, drafts the goal with a subtle failure mode and the discriminator that
that beats it, and stops for review: beats it, and stops for review:
```markdown ```markdown
## Goals ## Goals
@@ -69,8 +66,8 @@ that beats it, and stops for review:
- <empty until sign-off> - <empty until sign-off>
``` ```
You choose Ready. The agent works the subtasks, then fills `evidence` (each item an artifact plus a You choose Ready. The agent works the subtasks, fills `evidence` (each item an artifact plus a short
short read of it) and calls `CompleteGoal`: read of it), and calls `CompleteGoal`:
```markdown ```markdown
- evidence: - evidence:
@@ -78,7 +75,7 @@ short read of it) and calls `CompleteGoal`:
- > 48 files renamed; almost certain done, the silent-resolver failure mode is ruled out - > 48 files renamed; almost certain done, the silent-resolver failure mode is ruled out
``` ```
A fresh read-only subagent re-checks that evidence against the repo and the discriminator, then A fresh read-only subagent re-checks the evidence against the repo and the discriminator, then
returns its verdict and reasoning: returns its verdict and reasoning:
``` ```
@@ -91,8 +88,8 @@ VERDICT: accept
## The goals.md format ## The goals.md format
One project-local file, `<cwd>/.pi/goals.md` (gitignored, like pi-tasks), holds the title, a context One project-local file, `<cwd>/.pi/goals.md` (gitignored), holds the title, a context block, the
block, the goals, and a short append-only log. A fresh `/goals` draft replaces it. goals, and a short append-only log. A fresh `/goals` draft replaces it.
```markdown ```markdown
# ship the cache layer # ship the cache layer
@@ -123,29 +120,26 @@ Latency target came from the SLO review; keep the existing client API.
`[/]` active, `[x]` done, `[-]` cancelled). Goals are matched by their text, so the number is just `[/]` active, `[x]` done, `[-]` cancelled). Goals are matched by their text, so the number is just
for you to reference. for you to reference.
- The `discriminator` is the success test, written while planning: the positive observation that the - The `discriminator` is the success test, written while planning: the positive observation that the
goal actually succeeded and that none of the `subtle failure mode`s could fake. It has to show goal succeeded and that none of the `subtle failure mode`s could fake (a count moved, a test
something happened (a count moved, a test exercised the path, a metric beat noise), not just that a exercised the path, a metric beat noise), not just that a failure was avoided. `evidence` is the
failure was avoided. `evidence` is the proof, filled at sign-off: proof, filled at sign-off: each item pairs a durable artifact (a quoted and linked log, a table, a
each item pairs a durable artifact (a quoted and linked log, a table, a metric) with a short read of metric) with a short read of it. `verify`, when present, is the deterministic first stage.
it, not a bare claim. `verify`, when present, is the deterministic first stage of the sign-off. - Subtasks are any checkbox without a `goal:` prefix, under `- tasks:`. The agent ticks them, appends
- Subtasks are any checkbox without a `goal:` prefix, under `- tasks:` (`[/]` in progress, `[-]` to `## Log`, and sets a goal `[/]` when it starts it; only `CompleteGoal` writes `[x]`. Several
cancelled). The agent ticks them, appends to `## Log`, and sets a goal `[/]` when it starts it. Only goals can be active at once.
`CompleteGoal` writes `[x]`. Several goals can be active at once.
## Signing off a goal (`CompleteGoal`) ## Signing off a goal (`CompleteGoal`)
`CompleteGoal(goal)` (matched by the goal's text) is the only tool that marks a goal done; everything `CompleteGoal(goal)` (matched by the goal's text) is the only tool that marks a goal done; everything
else is the agent editing the file. It reads the goal's `evidence:` block from `.pi/goals.md`, so the else is the agent editing the file. It reads the goal's `evidence:` block from `.pi/goals.md`, then:
proof stays in the file where you can review it, then:
1. If the goal has a `verify:` command, it runs. A non-zero exit rejects right away, with no model 1. If the goal has a `verify:` command, it runs. A non-zero exit rejects right away, no model call.
call.
2. Otherwise a read-only `pi` subprocess (a fresh `--no-session` context, so it never sees the working 2. Otherwise a read-only `pi` subprocess (a fresh `--no-session` context, so it never sees the working
agent's transcript) inspects the `evidence:` against the repo, the `discriminator`, and the agent's transcript) inspects the `evidence:` against the repo, the `discriminator`, and the
`subtle failure mode`, and returns a verdict. It re-derives from the cited artifacts rather than `subtle failure mode`. It re-derives from the cited artifacts rather than trusting the claim, so
trusting the claim, so list real artifacts, not assertions. list real artifacts, not assertions.
3. On accept, the goal flips to `[x]` and a `## Log` line is written. On reject, the goal stays open 3. On accept, the goal flips to `[x]` and a `## Log` line is written. On reject, it stays open and the
and the agent is told what is missing. Either way the judge's reasoning comes back in the result. agent is told what is missing. Either way the judge's reasoning comes back in the result.
The judge defaults to your current model (a fresh context, same weights). Point it at another with The judge defaults to your current model (a fresh context, same weights). Point it at another with
`/goals judge <provider/model>` for an independent cross-family check. `/goals judge <provider/model>` for an independent cross-family check.
+11 -10
View File
@@ -38,7 +38,15 @@ import { basename, join, resolve } from "node:path";
import type { ExtensionAPI, ExtensionCommandContext, ExtensionContext } from "@earendil-works/pi-coding-agent"; import type { ExtensionAPI, ExtensionCommandContext, ExtensionContext } from "@earendil-works/pi-coding-agent";
import { Type } from "@sinclair/typebox"; import { Type } from "@sinclair/typebox";
import { counts, findGoal, type Goal, type PlanDoc, parse, recordSignOff, type SignOff } from "./plan-file.js"; import { counts, findGoal, type Goal, type PlanDoc, parse, recordSignOff, type SignOff } from "./plan-file.js";
import { evidenceJudgeSystem, evidenceJudgeUser, planDrafting, planInjection, reminder } from "./prompts.js"; import {
completeGoalDescription,
completeGoalParamDescription,
evidenceJudgeSystem,
evidenceJudgeUser,
planDrafting,
planInjection,
reminder,
} from "./prompts.js";
const STATE = "pi-goals-state"; const STATE = "pi-goals-state";
const PLAN_CONTEXT = "pi-goals-context"; // injected plan-mode guidance, stripped from history later const PLAN_CONTEXT = "pi-goals-context"; // injected plan-mode guidance, stripped from history later
@@ -263,16 +271,9 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
pi.registerTool({ pi.registerTool({
name: "CompleteGoal", name: "CompleteGoal",
label: "Complete goal", label: "Complete goal",
description: description: completeGoalDescription,
"Sign off a goal once its discriminator is satisfied. First fill the goal's evidence: block in " +
"goals.md: a list where each item pairs a durable artifact with a short read of it (a quoted+linked " +
"log, a table plus how to read it, or a metric plus what it shows; quote the key lines and link the " +
"rest, not a pasted blob or a bare claim). Then call this with the goal's desc (the text after " +
"'goal:'). Runs the goal's verify command (if any) then a read-only subagent that inspects that " +
"evidence against the repo and the discriminator. On accept, the goal is marked done and logged; on " +
"reject, it stays open and you get what is missing. The subagent's reasoning is returned either way.",
parameters: Type.Object({ parameters: Type.Object({
goal: Type.String({ description: "The goal's desc: the exact text after 'goal:' in its line." }), goal: Type.String({ description: completeGoalParamDescription }),
}), }),
async execute(_id, params, signal, _onUpdate, ctx) { async execute(_id, params, signal, _onUpdate, ctx) {
const content = readPlan(ctx); const content = readPlan(ctx);
+30 -7
View File
@@ -4,21 +4,22 @@
* Philosophy: the form guides a process; it does not police one. The agent can * Philosophy: the form guides a process; it does not police one. The agent can
* edit goals.md freely. These prompts + the goals.md structure make the right path * edit goals.md freely. These prompts + the goals.md structure make the right path
* the easy path. The only step that is genuinely rigorous is the evidence judge * the easy path. The only step that is genuinely rigorous is the evidence judge
* (6), and even that is reached by guiding the agent to call CompleteGoal, not by * (7), and even that is reached by guiding the agent to call CompleteGoal, not by
* trapping it. Bypasses stay visible in the git diff and the widget. * trapping it. Bypasses stay visible in the git diff and the widget.
* *
* Flow: * Flow (this file is ordered the way the agent meets each text, so it reads as one pass):
* SETUP (plan mode) 1. planDrafting — drafts goals (read-only phase) * SETUP (plan mode) 1. planDrafting — drafts goals (read-only phase)
* EXEC, each turn start 2. planInjection — "here is your plan, where you are" * EXEC, each turn start 2. planInjection — "here is your plan, where you are"
* EXEC, periodic 3. reminder — the typed nudge that drives upkeep + autonomy * EXEC, periodic 3. reminder — the typed nudge that drives upkeep + autonomy
* EXEC, loop continue 4. continuation — keep going toward the active goal * EXEC, loop continue 4. continuation — keep going toward the active goal
* EXEC, after each turn 5. loopJudge — continue / pause (cheap, foolable, ok) * EXEC, after each turn 5. loopJudge — continue / pause (cheap, foolable, ok)
* SIGN-OFF 6. evidenceJudge — read-only verify (rigorous; the one real check) * SIGN-OFF, agent-side 6. completeGoalTool — the CompleteGoal tool desc + param the agent reads
* SIGN-OFF, judge-side 7. evidenceJudge — read-only verify (rigorous; the one real check)
* *
* Read top to bottom to see the whole process. 5 and 6 are kept adjacent on * Read top to bottom to see the whole process. 5 and 7 embody the design contrast:
* purpose: the cheap-foolable vs must-not-be-fooled contrast is the design. * the cheap-foolable loop gate vs the must-not-be-fooled sign-off.
* *
* WIRED in index.ts: 1 planDrafting, 2 planInjection, 3 reminder, 6 evidenceJudge. * WIRED in index.ts: 1 planDrafting, 2 planInjection, 3 reminder, 6 completeGoalTool, 7 evidenceJudge.
* NOT YET WIRED: 4 continuation and 5 loopJudge define the autonomous re-prompt loop, which is * NOT YET WIRED: 4 continuation and 5 loopJudge define the autonomous re-prompt loop, which is
* intentionally not built in v1 (an until-done-style loop was judged too complex). They stay here so * intentionally not built in v1 (an until-done-style loop was judged too complex). They stay here so
* the full intended flow is reviewable; wire them if/when the loop is added. * the full intended flow is reviewable; wire them if/when the loop is added.
@@ -192,7 +193,29 @@ ${p.lastResponse}
} }
/* ───────────────────────────────────────────────────────────────────────── /* ─────────────────────────────────────────────────────────────────────────
* 6. evidenceJudge — SIGN-OFF, the one rigorous check * 6. completeGoalTool — SIGN-OFF, agent-side
*
* The description + param the agent reads on the one blessed tool, CompleteGoal.
* This is where the agent meets the sign-off: it fills evidence and calls the
* tool, which then runs verify + the judge (7). Kept here with the rest of the
* model-facing text so the whole process reads top to bottom.
* ──────────────────────────────────────────────────────────────────────── */
export const completeGoalDescription =
"Sign off a goal once its discriminator is satisfied. First fill the goal's evidence: block in " +
"goals.md: a list where each item pairs a durable artifact with a short read of it (a quoted+linked " +
"log, a table plus how to read it, or a metric plus what it shows; quote the key lines and link the " +
"rest, not a pasted blob or a bare claim). The read must show the success POSITIVELY happened (the " +
"result is present, the count moved the right way, the metric beat noise), not just that a failure " +
"was avoided; ruling out the failure modes is necessary but not sufficient. Then call this with the " +
"goal's desc (the text after 'goal:'). Runs the goal's verify command (if any) then a read-only " +
"subagent that inspects that evidence against the repo and the discriminator. On accept, the goal is " +
"marked done and logged; on reject, it stays open and you get what is missing. The subagent's " +
"reasoning is returned either way.";
export const completeGoalParamDescription = "The goal's desc: the exact text after 'goal:' in its line.";
/* ─────────────────────────────────────────────────────────────────────────
* 7. evidenceJudge — SIGN-OFF, judge-side; the one rigorous check
* *
* Runs inside CompleteGoal, on a read-only pi subprocess (fresh context via * Runs inside CompleteGoal, on a read-only pi subprocess (fresh context via
* --no-session, so it never sees the working agent's transcript; override to a * --no-session, so it never sees the working agent's transcript; override to a