mirror of
https://github.com/wassname/pi-plan.git
synced 2026-06-27 15:16:18 +08:00
pi-goals: move CompleteGoal desc into prompts.ts; trim README
The tool description and param doc are model-facing, so they belong in prompts.ts with the rest. Add them as step 6 (completeGoalTool) and renumber the evidence judge to 7; prompts.ts is now ordered the way the agent meets each text, so it reads as one pass. The moved desc also carries the positive-success framing: evidence must show the success happened, not just that a failure was avoided. README trimmed (saying less, voice unchanged): tighter intro and comparison, less prose around the examples and sign-off steps. Humanizer lint clean. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -2,15 +2,14 @@
|
|||||||
|
|
||||||
Plan mode for agreeing on goals before any code gets written. Each goal names the subtle failure mode
|
Plan mode for agreeing on goals before any code gets written. Each goal names the subtle failure mode
|
||||||
that could fake a "done" and the discriminator that tells real success from it, plus subtasks and the
|
that could fake a "done" and the discriminator that tells real success from it, plus subtasks and the
|
||||||
evidence that gets checked at sign-off. It all lives in one markdown file you can read and print. A
|
evidence checked at sign-off. It lives in one markdown file. A widget keeps the goals in front of you
|
||||||
widget keeps the goals in front of you through compaction, a reminder nudges the agent to keep the
|
through compaction, a reminder nudges the agent to keep the file current, and a goal is signed off
|
||||||
file current and work toward the goals on its own, and a goal is signed off only after a read-only
|
only after a read-only subagent checks its evidence.
|
||||||
subagent has checked its evidence.
|
|
||||||
|
|
||||||
It guides rather than guards. Like [pi-milestones](https://github.com/Neuron-Mr-White/UniPi/tree/main/packages/milestone)
|
Like [pi-milestones](https://github.com/Neuron-Mr-White/UniPi/tree/main/packages/milestone) and
|
||||||
and [burneikis/pi-plan](https://github.com/burneikis/pi-plan), it leans on a form and a process to
|
[burneikis/pi-plan](https://github.com/burneikis/pi-plan), it guides rather than guards: a form and a
|
||||||
steer the agent and trust its judgement. [pi-lgtm](https://github.com/wassname/pi-lgtm) was my earlier
|
process the agent follows. [pi-lgtm](https://github.com/wassname/pi-lgtm) was my earlier, more complex
|
||||||
attempt and got too complex; this one stays small and maintainable.
|
attempt.
|
||||||
|
|
||||||
## Install
|
## Install
|
||||||
|
|
||||||
@@ -46,14 +45,12 @@ model for the sign-off judge (the default is your current model).
|
|||||||
|
|
||||||
## Example
|
## Example
|
||||||
|
|
||||||
Start plan mode with an optional seed:
|
|
||||||
|
|
||||||
```
|
```
|
||||||
/goals audit the papers dir metadata and clean up empty dirs
|
/goals audit the papers dir metadata and clean up empty dirs
|
||||||
```
|
```
|
||||||
|
|
||||||
The agent explores read-only, then drafts the goal with a subtle failure mode and the discriminator
|
The agent explores read-only, drafts the goal with a subtle failure mode and the discriminator that
|
||||||
that beats it, and stops for review:
|
beats it, and stops for review:
|
||||||
|
|
||||||
```markdown
|
```markdown
|
||||||
## Goals
|
## Goals
|
||||||
@@ -69,8 +66,8 @@ that beats it, and stops for review:
|
|||||||
- <empty until sign-off>
|
- <empty until sign-off>
|
||||||
```
|
```
|
||||||
|
|
||||||
You choose Ready. The agent works the subtasks, then fills `evidence` (each item an artifact plus a
|
You choose Ready. The agent works the subtasks, fills `evidence` (each item an artifact plus a short
|
||||||
short read of it) and calls `CompleteGoal`:
|
read of it), and calls `CompleteGoal`:
|
||||||
|
|
||||||
```markdown
|
```markdown
|
||||||
- evidence:
|
- evidence:
|
||||||
@@ -78,7 +75,7 @@ short read of it) and calls `CompleteGoal`:
|
|||||||
- > 48 files renamed; almost certain done, the silent-resolver failure mode is ruled out
|
- > 48 files renamed; almost certain done, the silent-resolver failure mode is ruled out
|
||||||
```
|
```
|
||||||
|
|
||||||
A fresh read-only subagent re-checks that evidence against the repo and the discriminator, then
|
A fresh read-only subagent re-checks the evidence against the repo and the discriminator, then
|
||||||
returns its verdict and reasoning:
|
returns its verdict and reasoning:
|
||||||
|
|
||||||
```
|
```
|
||||||
@@ -91,8 +88,8 @@ VERDICT: accept
|
|||||||
|
|
||||||
## The goals.md format
|
## The goals.md format
|
||||||
|
|
||||||
One project-local file, `<cwd>/.pi/goals.md` (gitignored, like pi-tasks), holds the title, a context
|
One project-local file, `<cwd>/.pi/goals.md` (gitignored), holds the title, a context block, the
|
||||||
block, the goals, and a short append-only log. A fresh `/goals` draft replaces it.
|
goals, and a short append-only log. A fresh `/goals` draft replaces it.
|
||||||
|
|
||||||
```markdown
|
```markdown
|
||||||
# ship the cache layer
|
# ship the cache layer
|
||||||
@@ -123,29 +120,26 @@ Latency target came from the SLO review; keep the existing client API.
|
|||||||
`[/]` active, `[x]` done, `[-]` cancelled). Goals are matched by their text, so the number is just
|
`[/]` active, `[x]` done, `[-]` cancelled). Goals are matched by their text, so the number is just
|
||||||
for you to reference.
|
for you to reference.
|
||||||
- The `discriminator` is the success test, written while planning: the positive observation that the
|
- The `discriminator` is the success test, written while planning: the positive observation that the
|
||||||
goal actually succeeded and that none of the `subtle failure mode`s could fake. It has to show
|
goal succeeded and that none of the `subtle failure mode`s could fake (a count moved, a test
|
||||||
something happened (a count moved, a test exercised the path, a metric beat noise), not just that a
|
exercised the path, a metric beat noise), not just that a failure was avoided. `evidence` is the
|
||||||
failure was avoided. `evidence` is the proof, filled at sign-off:
|
proof, filled at sign-off: each item pairs a durable artifact (a quoted and linked log, a table, a
|
||||||
each item pairs a durable artifact (a quoted and linked log, a table, a metric) with a short read of
|
metric) with a short read of it. `verify`, when present, is the deterministic first stage.
|
||||||
it, not a bare claim. `verify`, when present, is the deterministic first stage of the sign-off.
|
- Subtasks are any checkbox without a `goal:` prefix, under `- tasks:`. The agent ticks them, appends
|
||||||
- Subtasks are any checkbox without a `goal:` prefix, under `- tasks:` (`[/]` in progress, `[-]`
|
to `## Log`, and sets a goal `[/]` when it starts it; only `CompleteGoal` writes `[x]`. Several
|
||||||
cancelled). The agent ticks them, appends to `## Log`, and sets a goal `[/]` when it starts it. Only
|
goals can be active at once.
|
||||||
`CompleteGoal` writes `[x]`. Several goals can be active at once.
|
|
||||||
|
|
||||||
## Signing off a goal (`CompleteGoal`)
|
## Signing off a goal (`CompleteGoal`)
|
||||||
|
|
||||||
`CompleteGoal(goal)` (matched by the goal's text) is the only tool that marks a goal done; everything
|
`CompleteGoal(goal)` (matched by the goal's text) is the only tool that marks a goal done; everything
|
||||||
else is the agent editing the file. It reads the goal's `evidence:` block from `.pi/goals.md`, so the
|
else is the agent editing the file. It reads the goal's `evidence:` block from `.pi/goals.md`, then:
|
||||||
proof stays in the file where you can review it, then:
|
|
||||||
|
|
||||||
1. If the goal has a `verify:` command, it runs. A non-zero exit rejects right away, with no model
|
1. If the goal has a `verify:` command, it runs. A non-zero exit rejects right away, no model call.
|
||||||
call.
|
|
||||||
2. Otherwise a read-only `pi` subprocess (a fresh `--no-session` context, so it never sees the working
|
2. Otherwise a read-only `pi` subprocess (a fresh `--no-session` context, so it never sees the working
|
||||||
agent's transcript) inspects the `evidence:` against the repo, the `discriminator`, and the
|
agent's transcript) inspects the `evidence:` against the repo, the `discriminator`, and the
|
||||||
`subtle failure mode`, and returns a verdict. It re-derives from the cited artifacts rather than
|
`subtle failure mode`. It re-derives from the cited artifacts rather than trusting the claim, so
|
||||||
trusting the claim, so list real artifacts, not assertions.
|
list real artifacts, not assertions.
|
||||||
3. On accept, the goal flips to `[x]` and a `## Log` line is written. On reject, the goal stays open
|
3. On accept, the goal flips to `[x]` and a `## Log` line is written. On reject, it stays open and the
|
||||||
and the agent is told what is missing. Either way the judge's reasoning comes back in the result.
|
agent is told what is missing. Either way the judge's reasoning comes back in the result.
|
||||||
|
|
||||||
The judge defaults to your current model (a fresh context, same weights). Point it at another with
|
The judge defaults to your current model (a fresh context, same weights). Point it at another with
|
||||||
`/goals judge <provider/model>` for an independent cross-family check.
|
`/goals judge <provider/model>` for an independent cross-family check.
|
||||||
|
|||||||
+11
-10
@@ -38,7 +38,15 @@ import { basename, join, resolve } from "node:path";
|
|||||||
import type { ExtensionAPI, ExtensionCommandContext, ExtensionContext } from "@earendil-works/pi-coding-agent";
|
import type { ExtensionAPI, ExtensionCommandContext, ExtensionContext } from "@earendil-works/pi-coding-agent";
|
||||||
import { Type } from "@sinclair/typebox";
|
import { Type } from "@sinclair/typebox";
|
||||||
import { counts, findGoal, type Goal, type PlanDoc, parse, recordSignOff, type SignOff } from "./plan-file.js";
|
import { counts, findGoal, type Goal, type PlanDoc, parse, recordSignOff, type SignOff } from "./plan-file.js";
|
||||||
import { evidenceJudgeSystem, evidenceJudgeUser, planDrafting, planInjection, reminder } from "./prompts.js";
|
import {
|
||||||
|
completeGoalDescription,
|
||||||
|
completeGoalParamDescription,
|
||||||
|
evidenceJudgeSystem,
|
||||||
|
evidenceJudgeUser,
|
||||||
|
planDrafting,
|
||||||
|
planInjection,
|
||||||
|
reminder,
|
||||||
|
} from "./prompts.js";
|
||||||
|
|
||||||
const STATE = "pi-goals-state";
|
const STATE = "pi-goals-state";
|
||||||
const PLAN_CONTEXT = "pi-goals-context"; // injected plan-mode guidance, stripped from history later
|
const PLAN_CONTEXT = "pi-goals-context"; // injected plan-mode guidance, stripped from history later
|
||||||
@@ -263,16 +271,9 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
|
|||||||
pi.registerTool({
|
pi.registerTool({
|
||||||
name: "CompleteGoal",
|
name: "CompleteGoal",
|
||||||
label: "Complete goal",
|
label: "Complete goal",
|
||||||
description:
|
description: completeGoalDescription,
|
||||||
"Sign off a goal once its discriminator is satisfied. First fill the goal's evidence: block in " +
|
|
||||||
"goals.md: a list where each item pairs a durable artifact with a short read of it (a quoted+linked " +
|
|
||||||
"log, a table plus how to read it, or a metric plus what it shows; quote the key lines and link the " +
|
|
||||||
"rest, not a pasted blob or a bare claim). Then call this with the goal's desc (the text after " +
|
|
||||||
"'goal:'). Runs the goal's verify command (if any) then a read-only subagent that inspects that " +
|
|
||||||
"evidence against the repo and the discriminator. On accept, the goal is marked done and logged; on " +
|
|
||||||
"reject, it stays open and you get what is missing. The subagent's reasoning is returned either way.",
|
|
||||||
parameters: Type.Object({
|
parameters: Type.Object({
|
||||||
goal: Type.String({ description: "The goal's desc: the exact text after 'goal:' in its line." }),
|
goal: Type.String({ description: completeGoalParamDescription }),
|
||||||
}),
|
}),
|
||||||
async execute(_id, params, signal, _onUpdate, ctx) {
|
async execute(_id, params, signal, _onUpdate, ctx) {
|
||||||
const content = readPlan(ctx);
|
const content = readPlan(ctx);
|
||||||
|
|||||||
+30
-7
@@ -4,21 +4,22 @@
|
|||||||
* Philosophy: the form guides a process; it does not police one. The agent can
|
* Philosophy: the form guides a process; it does not police one. The agent can
|
||||||
* edit goals.md freely. These prompts + the goals.md structure make the right path
|
* edit goals.md freely. These prompts + the goals.md structure make the right path
|
||||||
* the easy path. The only step that is genuinely rigorous is the evidence judge
|
* the easy path. The only step that is genuinely rigorous is the evidence judge
|
||||||
* (6), and even that is reached by guiding the agent to call CompleteGoal, not by
|
* (7), and even that is reached by guiding the agent to call CompleteGoal, not by
|
||||||
* trapping it. Bypasses stay visible in the git diff and the widget.
|
* trapping it. Bypasses stay visible in the git diff and the widget.
|
||||||
*
|
*
|
||||||
* Flow:
|
* Flow (this file is ordered the way the agent meets each text, so it reads as one pass):
|
||||||
* SETUP (plan mode) 1. planDrafting — drafts goals (read-only phase)
|
* SETUP (plan mode) 1. planDrafting — drafts goals (read-only phase)
|
||||||
* EXEC, each turn start 2. planInjection — "here is your plan, where you are"
|
* EXEC, each turn start 2. planInjection — "here is your plan, where you are"
|
||||||
* EXEC, periodic 3. reminder — the typed nudge that drives upkeep + autonomy
|
* EXEC, periodic 3. reminder — the typed nudge that drives upkeep + autonomy
|
||||||
* EXEC, loop continue 4. continuation — keep going toward the active goal
|
* EXEC, loop continue 4. continuation — keep going toward the active goal
|
||||||
* EXEC, after each turn 5. loopJudge — continue / pause (cheap, foolable, ok)
|
* EXEC, after each turn 5. loopJudge — continue / pause (cheap, foolable, ok)
|
||||||
* SIGN-OFF 6. evidenceJudge — read-only verify (rigorous; the one real check)
|
* SIGN-OFF, agent-side 6. completeGoalTool — the CompleteGoal tool desc + param the agent reads
|
||||||
|
* SIGN-OFF, judge-side 7. evidenceJudge — read-only verify (rigorous; the one real check)
|
||||||
*
|
*
|
||||||
* Read top to bottom to see the whole process. 5 and 6 are kept adjacent on
|
* Read top to bottom to see the whole process. 5 and 7 embody the design contrast:
|
||||||
* purpose: the cheap-foolable vs must-not-be-fooled contrast is the design.
|
* the cheap-foolable loop gate vs the must-not-be-fooled sign-off.
|
||||||
*
|
*
|
||||||
* WIRED in index.ts: 1 planDrafting, 2 planInjection, 3 reminder, 6 evidenceJudge.
|
* WIRED in index.ts: 1 planDrafting, 2 planInjection, 3 reminder, 6 completeGoalTool, 7 evidenceJudge.
|
||||||
* NOT YET WIRED: 4 continuation and 5 loopJudge define the autonomous re-prompt loop, which is
|
* NOT YET WIRED: 4 continuation and 5 loopJudge define the autonomous re-prompt loop, which is
|
||||||
* intentionally not built in v1 (an until-done-style loop was judged too complex). They stay here so
|
* intentionally not built in v1 (an until-done-style loop was judged too complex). They stay here so
|
||||||
* the full intended flow is reviewable; wire them if/when the loop is added.
|
* the full intended flow is reviewable; wire them if/when the loop is added.
|
||||||
@@ -192,7 +193,29 @@ ${p.lastResponse}
|
|||||||
}
|
}
|
||||||
|
|
||||||
/* ─────────────────────────────────────────────────────────────────────────
|
/* ─────────────────────────────────────────────────────────────────────────
|
||||||
* 6. evidenceJudge — SIGN-OFF, the one rigorous check
|
* 6. completeGoalTool — SIGN-OFF, agent-side
|
||||||
|
*
|
||||||
|
* The description + param the agent reads on the one blessed tool, CompleteGoal.
|
||||||
|
* This is where the agent meets the sign-off: it fills evidence and calls the
|
||||||
|
* tool, which then runs verify + the judge (7). Kept here with the rest of the
|
||||||
|
* model-facing text so the whole process reads top to bottom.
|
||||||
|
* ──────────────────────────────────────────────────────────────────────── */
|
||||||
|
export const completeGoalDescription =
|
||||||
|
"Sign off a goal once its discriminator is satisfied. First fill the goal's evidence: block in " +
|
||||||
|
"goals.md: a list where each item pairs a durable artifact with a short read of it (a quoted+linked " +
|
||||||
|
"log, a table plus how to read it, or a metric plus what it shows; quote the key lines and link the " +
|
||||||
|
"rest, not a pasted blob or a bare claim). The read must show the success POSITIVELY happened (the " +
|
||||||
|
"result is present, the count moved the right way, the metric beat noise), not just that a failure " +
|
||||||
|
"was avoided; ruling out the failure modes is necessary but not sufficient. Then call this with the " +
|
||||||
|
"goal's desc (the text after 'goal:'). Runs the goal's verify command (if any) then a read-only " +
|
||||||
|
"subagent that inspects that evidence against the repo and the discriminator. On accept, the goal is " +
|
||||||
|
"marked done and logged; on reject, it stays open and you get what is missing. The subagent's " +
|
||||||
|
"reasoning is returned either way.";
|
||||||
|
|
||||||
|
export const completeGoalParamDescription = "The goal's desc: the exact text after 'goal:' in its line.";
|
||||||
|
|
||||||
|
/* ─────────────────────────────────────────────────────────────────────────
|
||||||
|
* 7. evidenceJudge — SIGN-OFF, judge-side; the one rigorous check
|
||||||
*
|
*
|
||||||
* Runs inside CompleteGoal, on a read-only pi subprocess (fresh context via
|
* Runs inside CompleteGoal, on a read-only pi subprocess (fresh context via
|
||||||
* --no-session, so it never sees the working agent's transcript; override to a
|
* --no-session, so it never sees the working agent's transcript; override to a
|
||||||
|
|||||||
Reference in New Issue
Block a user