pi-plan: plan-mode goals + evidence in one plan.md, subagent sign-off

Small, guide-not-gate plan/goal tracker for pi. The agent edits plan.md with its normal Edit tool; CompleteGoal is the one blessed path that runs verify + a read-only judge and records the result. Plan mode drafts goals (done_when + failure_modes + subtasks), a per-turn injection keeps the active goal alive through compaction, and a reminder drives upkeep + autonomy. - src/plan-file.ts: pure parse + the two writes CompleteGoal needs + recordSignOff - src/index.ts: plan mode, review menu, injection, reminder, widget, CompleteGoal, oracle spawn - src/prompts.ts: all model-facing text in flow order - test/: 15 unit tests (parser, disambiguation, sign-off record logic) Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
2026-06-27 16:46:16 +08:00 · 2026-06-15 18:15:03 +08:00
commit d97b532d7b
10 changed files with 1449 additions and 0 deletions
@@ -0,0 +1,114 @@
+# pi-plan
+
+A [pi](https://github.com/badlogic/pi-mono) extension for plan-driven, goal-tracked work in one
+`plan.md`. Set up goals (with evidence and failure modes) in plan mode, work them, and sign a goal
+off only when a read-only subagent has checked the evidence.
+
+Successor to [pi-lgtm](https://github.com/wassname/pi-lgtm), kept deliberately small: about
+[burneikis/pi-plan](https://github.com/burneikis/pi-plan) plus the additions, goals with evidence,
+a sign-off check, a widget, and a reminder.
+
+The form guides; it does not gate. The agent edits `plan.md` with its normal Edit tool. The one
+blessed tool is `CompleteGoal`, which runs the sign-off check and records the result. The reminder,
+the injected plan summary, and git/widget visibility carry the process. It trusts the agent's
+judgement rather than guarding it.
+
+## Install
+
+```bash
+pi install npm:@wassname2/pi-plan
+```
+
+Or run without installing:
+
+```bash
+pi -e npm:@wassname2/pi-plan
+```
+
+## Use
+
+```
+/plan add CSV export to the report view
+```
+
+1. Plan. The agent explores read-only and writes goals into `plan.md` (see format below).
+2. Review. You get a menu: Ready, Edit (ask the agent to revise), Open in `$EDITOR`, or Cancel.
+   On Ready you choose whether to keep the current context or start fresh and compacted.
+3. Work. Each turn the active goal is injected (so it survives compaction) and a reminder nudges
+   the agent to keep `plan.md` current and work autonomously. When a goal's `done_when` is met the
+   agent calls `CompleteGoal`, which runs `verify` and a read-only judge and, on accept, marks it
+   done and logs it.
+
+Other commands: `/plan` (print the plan), `/plan clear` (empty `plan.md`, history kept in git),
+`/plan judge <model-ref>` (use a specific model for the sign-off judge; default is your current
+model).
+
+## plan.md format
+
+One file holds the objective, the goals, and a short append-only log.
+
+```markdown
+# Plan: ship the cache layer
+
+## Goal: Implement cache layer
+<!-- id: cache-layer-1 -->
+status: active
+done_when: p95 < 50ms on bench-X. If wrong: timeouts in load-test.log
+verify: pytest tests/cache -q && python bench/p95.py --max-ms 50
+failure_modes:
+  - cache silently bypassed (hit-rate ~0, latency ok by luck)
+  - bench too small to exercise eviction
+- [x] wire cache client
+- [ ] eviction policy
+
+## Log
+- 2026-06-15 14:02  cache client wired; eviction next
+```
+
+- A goal is a `## Goal:` header with an `<!-- id -->`, a `status:`
+  (`open` | `active` | `done` | `cancelled`), a falsifiable `done_when:` (what you expect, and the
+  symptom if it is NOT met), an optional `verify:` shell command, a `failure_modes:` pre-mortem
+  list, and `- [ ]` subtasks.
+- `done_when` names the evidence that distinguishes real success from a subtle failure. `verify`,
+  when present, is the deterministic first stage of the sign-off check.
+- The agent ticks subtasks, appends to `## Log`, and sets `status` as it works. Multiple goals may
+  be `active`.
+
+## The sign-off check (`CompleteGoal`)
+
+`CompleteGoal(goal_id, evidence, paths?)` is the one blessed completion path:
+
+1. If the goal has a `verify:` command, it is run. A non-zero exit rejects immediately, with no model
+   call.
+2. Otherwise a read-only `pi` subprocess (the judge) inspects the evidence against the repo and the
+   named failure modes and returns a verdict. It re-derives from the artifacts you point it at
+   rather than trusting the claim, so point `evidence`/`paths` at durable artifacts (saved logs,
+   committed diffs, files).
+3. On accept, the goal's `status` flips to `done` and a `## Log` line is written. On reject, the
+   goal stays open and the agent is told what is missing.
+
+The judge defaults to your current model (guaranteed authorized and capable). Set a different one
+with `/plan judge <provider/model>` for an independent cross-family check.
+
+## Prompts
+
+All model-facing text lives in [`src/prompts.ts`](src/prompts.ts), in flow order, so the process is
+easy to review end to end.
+
+## Develop
+
+```bash
+pi -e ./src/index.ts        # load locally
+npm test                    # vitest: parser + sign-off record logic
+npm run typecheck
+npm run lint
+```
+
+## Not (yet) included
+
+No autonomous re-prompt loop (an until-done-style loop judge). Autonomy comes from the reminder, not
+a harness. Plan-phase model stickiness is a documented next step.
+
+## License
+
+MIT