wassname 838c42d7bd pi-goals: discriminator/failure-mode format + visible sign-off judge
Replace done_when with a discriminator + subtle-failure-mode pair as the
heart of each goal. The discriminator is the POSITIVE success observation
that no failure mode could fake, not just failure-avoidance: a run can
dodge every trap and still produce nothing. Carried through planDrafting,
the sign-off judge, README, and the parser doc.

Format migration: flat numbered markdown goals (`1. [/] goal: ...`),
keyword-anchored parsing (indentation cosmetic), goals matched by text,
subtask states [ ]/[/]/[x]/[-] plus ~~strike~~. Evidence empty at
planning, filled at sign-off, multi-line supported.

CompleteGoal now returns the judge's reasoning under a
`--- sign-off judge ---` block (was just "Signed off"), so the verdict is
visible. Plan mode is read-only: edit/write (except goals.md) and
mutating bash are blocked by a tool hook.

17 parser tests, typecheck + biome clean.

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-16 11:45:08 +08:00

pi-goals

Plan mode for agreeing on goals before any code gets written. Each goal names the subtle failure mode that could fake a "done" and the discriminator that tells real success from it, plus subtasks and the evidence that gets checked at sign-off. It all lives in one markdown file you can read and print. A widget keeps the goals in front of you through compaction, a reminder nudges the agent to keep the file current and work toward the goals on its own, and a goal is signed off only after a read-only subagent has checked its evidence.

It guides rather than guards. Like pi-milestones and burneikis/pi-plan, it leans on a form and a process to steer the agent and trust its judgement. pi-lgtm was my earlier attempt and got too complex; this one stays small and maintainable.

Install

pi install npm:@wassname2/pi-goals

Or run it without installing:

pi -e npm:@wassname2/pi-goals

Use

/goals CSV export for the report view

/goals enters plan mode and starts a conversation; the description is an optional seed, so plain /goals works too. From there:

  1. Plan. The agent explores read-only, asks about anything unclear, and writes the goals into .pi/goals.md.
  2. Review. You get a menu: Ready, Edit (ask the agent to revise), Open in $EDITOR, or Cancel. On Ready you choose whether to keep the current context or start fresh and compacted.
  3. Work. Each turn the active goal is injected so it survives compaction, and a reminder nudges the agent to keep goals.md current and keep going. When a goal's discriminator is satisfied the agent calls CompleteGoal, which runs verify and a read-only judge, then marks the goal done and logs it.

Other commands: /goals clear empties .pi/goals.md; /goals judge <model-ref> picks a specific model for the sign-off judge (the default is your current model).

Example

Start plan mode with an optional seed:

/goals audit the papers dir metadata and clean up empty dirs

The agent explores read-only, then drafts the goal with a subtle failure mode and the discriminator that beats it, and stops for review:

## Goals

1. [ ] goal: Audit steering/ metadata and remove empty dirs
  - subtle failure mode: report written but counts are zero (resolver errored silently)
  - discriminator: report shows the XXXX count before/after AND a non-zero rename count
  - tasks:
    1. [ ] dry-run the metadata resolve
    2. [ ] remove the empty _artifacts dirs
    3. [ ] write the report
  - evidence:
    - <empty until sign-off>

You choose Ready. The agent works the subtasks, then fills evidence (each item an artifact plus a short read of it) and calls CompleteGoal:

  - evidence:
    - > scripts/metadata_report.txt: XXXX 52 -> 4, 146 empty _artifacts removed
    - > 48 files renamed; almost certain done, the silent-resolver failure mode is ruled out

A fresh read-only subagent re-checks that evidence against the repo and the discriminator, then returns its verdict and reasoning:

Signed off "Audit steering/ metadata and remove empty dirs". Marked done in goals.md.

--- sign-off judge ---
metadata_report.txt present; counts 52 -> 4 confirmed; rename log shows 48 renamed (not zero).
VERDICT: accept

The goals.md format

One project-local file, <cwd>/.pi/goals.md (gitignored, like pi-tasks), holds the title, a context block, the goals, and a short append-only log. A fresh /goals draft replaces it.

# ship the cache layer

Latency target came from the SLO review; keep the existing client API.

## Goals

1. [/] goal: Implement cache layer
  - subtle failure mode: cache silently bypassed, latency ok by luck
  - discriminator: hit-rate > 0.8 in load-test.log (a bypass reads ~0)
  - verify: pytest tests/cache -q && python bench/p95.py --max-ms 50
  - tasks:
    1. [x] wire cache client
    2. [/] eviction policy
  - evidence:
    - > load-test.log: p95=41ms, hit-rate 0.93 (not bypassed)

# Future work / out of scope

- distributed cache

## Log
- 2026-06-15 14:02  cache client wired; eviction next
  • A goal is a numbered checkbox line beginning goal:; the checkbox carries its state ([ ] open, [/] active, [x] done, [-] cancelled). Goals are matched by their text, so the number is just for you to reference.
  • The discriminator is the success test, written while planning: the positive observation that the goal actually succeeded and that none of the subtle failure modes could fake. It has to show something happened (a count moved, a test exercised the path, a metric beat noise), not just that a failure was avoided. evidence is the proof, filled at sign-off: each item pairs a durable artifact (a quoted and linked log, a table, a metric) with a short read of it, not a bare claim. verify, when present, is the deterministic first stage of the sign-off.
  • Subtasks are any checkbox without a goal: prefix, under - tasks: ([/] in progress, [-] cancelled). The agent ticks them, appends to ## Log, and sets a goal [/] when it starts it. Only CompleteGoal writes [x]. Several goals can be active at once.

Signing off a goal (CompleteGoal)

CompleteGoal(goal) (matched by the goal's text) is the only tool that marks a goal done; everything else is the agent editing the file. It reads the goal's evidence: block from .pi/goals.md, so the proof stays in the file where you can review it, then:

  1. If the goal has a verify: command, it runs. A non-zero exit rejects right away, with no model call.
  2. Otherwise a read-only pi subprocess (a fresh --no-session context, so it never sees the working agent's transcript) inspects the evidence: against the repo, the discriminator, and the subtle failure mode, and returns a verdict. It re-derives from the cited artifacts rather than trusting the claim, so list real artifacts, not assertions.
  3. On accept, the goal flips to [x] and a ## Log line is written. On reject, the goal stays open and the agent is told what is missing. Either way the judge's reasoning comes back in the result.

The judge defaults to your current model (a fresh context, same weights). Point it at another with /goals judge <provider/model> for an independent cross-family check.

Prompts

All model-facing text lives in src/prompts.ts, in flow order, so you can read the whole process top to bottom.

Develop

pi -e ./src/index.ts        # load locally
npm test                    # vitest: parser + sign-off record logic
npm run typecheck
npm run lint

Not (yet) included

  • No autonomous re-prompt loop. The reminder nudges the agent within a turn, but the turn still ends and hands back to you; nothing auto-re-prompts until the goals are done.
  • The plan and execution phases can't yet run on different, sticky models.

License

MIT

S
Description
pi extension: plan-mode goals with evidence in one plan.md, signed off by a read-only subagent check. Small successor to pi-lgtm.
Readme 333 KiB
Languages
TypeScript 100%