pi -p --no-session clones the repo into the parent of cwd, leaving a stale
directory that the NEXT judge then finds and rejects the goal over. Needs a
temp-dir fix or in-repo inspection.
The tool description and param doc are model-facing, so they belong in
prompts.ts with the rest. Add them as step 6 (completeGoalTool) and
renumber the evidence judge to 7; prompts.ts is now ordered the way the
agent meets each text, so it reads as one pass.
The moved desc also carries the positive-success framing: evidence must
show the success happened, not just that a failure was avoided.
README trimmed (saying less, voice unchanged): tighter intro and
comparison, less prose around the examples and sign-off steps. Humanizer
lint clean.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Replace done_when with a discriminator + subtle-failure-mode pair as the
heart of each goal. The discriminator is the POSITIVE success observation
that no failure mode could fake, not just failure-avoidance: a run can
dodge every trap and still produce nothing. Carried through planDrafting,
the sign-off judge, README, and the parser doc.
Format migration: flat numbered markdown goals (`1. [/] goal: ...`),
keyword-anchored parsing (indentation cosmetic), goals matched by text,
subtask states [ ]/[/]/[x]/[-] plus ~~strike~~. Evidence empty at
planning, filled at sign-off, multi-line supported.
CompleteGoal now returns the judge's reasoning under a
`--- sign-off judge ---` block (was just "Signed off"), so the verdict is
visible. Plan mode is read-only: edit/write (except goals.md) and
mutating bash are blocked by a tool hook.
17 parser tests, typecheck + biome clean.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Goal state moves from a `status:` line into a checkbox on the goal header
(single source of truth, renders natively): [ ] open, [/] active, [x] done,
[-] cancelled. Only CompleteGoal writes [x]; the agent sets [/] when starting.
The GoalStatus enum and all consumers (widget, injection, counts) are unchanged.
Evidence becomes a goal field, not an ephemeral tool argument: an `evidence:`
block the agent fills before sign-off, read by CompleteGoal from the file
(git-tracked, reviewable). The tool is now CompleteGoal(goal_id) only.
Also:
- format reorder: subtasks under the goal; failure_modes + evidence as
separated trailing blocks (no abutting dash-lists)
- widget: (done/total tasks), and done goals show checked instead of hiding
- drafting prompt: guard against a circular done_when (one that points at the
file's own checkbox/log, which the sign-off writes, so it can never pass)
- drafting template now includes the H1 and the <!-- id --> line CompleteGoal
needs to locate a goal
- strip ANSI/CSI control codes from the judge subprocess output
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- The Ready->fresh-context crash was a stale pi.* call inside
withSession. Prior commit moved sendUserMessage to sessionCtx but
left pi.setSessionName inside withSession (also stale -> crash).
Drop it (cosmetic) and use only sessionCtx in the swap window.
- Print plan.md on execution start (both fresh and in-place) so the
user sees what's being worked on after a context switch. Plan text
captured before newSession since ctx goes stale.
- Widget: "(N todo)" -> "(N task[s])"
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- startExecution: inside withSession, send via the ReplacedSessionContext
(sessionCtx.sendUserMessage) and set the session name there. The old
code used the global pi.* handle bound to the replaced session, which
is stale after newSession (runner.assertActive) -> crash on the
"fresh, compacted context" choice.
- prompts: replace em-dashes in model-facing strings with commas/
semicolons/periods (humanizer pass; comments left as-is)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
The drafting prompt over-decomposed: one goal per item, long run-on
done_when (criterion + failure symptom in one line), and 3 mandatory
failure_modes. Plans came out verbose and hard to read.
- planDrafting: default to ONE goal; add another only for a genuinely
separate checkpoint; near-identical items become subtasks. Subtasks
only for 3+ step goals. Don't invent phases. (granularity heuristic
adapted from tintinweb/pi-tasks when-to/when-not guidance)
- done_when: one falsifiable check, no embedded "if wrong" clause (the
failure symptom belongs in failure_modes)
- failure_modes: 0-2 terse items, optional
- Sync the stale done_when wording in README and plan-file.ts comment
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
- Restore exitPlanMode closing brace + CompleteGoal tool registration
opening that an earlier edit dropped (parse error at 224)
- Edit-revise path now sends with deliverAs:"followUp" so it doesn't
throw "Agent is already processing" mid-stream
- Bare /plan now prompts for an objective and enters plan mode instead
of only showing the current plan
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
External review (deepseek-v4-pro, docs/reviews/review.md) found:
- Real bug: reviewLoop cast agent_end's ExtensionContext to ExtensionCommandContext
and called newSession on it, which would crash the "fresh context" path. Now the
/plan command handler's context (which has newSession) is saved and reused, with a
graceful in-place fallback (burneikis pattern).
- Dead code: continuation + loopJudge prompts are unused (the autonomous loop is
intentionally cut). Marked NOT-YET-WIRED in the prompts.ts flow header rather than
removed, so the full intended flow stays reviewable.
Review also confirmed: no comment bloat, no over-engineering.
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
Small, guide-not-gate plan/goal tracker for pi. The agent edits plan.md with
its normal Edit tool; CompleteGoal is the one blessed path that runs verify +
a read-only judge and records the result. Plan mode drafts goals (done_when +
failure_modes + subtasks), a per-turn injection keeps the active goal alive
through compaction, and a reminder drives upkeep + autonomy.
- src/plan-file.ts: pure parse + the two writes CompleteGoal needs + recordSignOff
- src/index.ts: plan mode, review menu, injection, reminder, widget, CompleteGoal, oracle spawn
- src/prompts.ts: all model-facing text in flow order
- test/: 15 unit tests (parser, disambiguation, sign-off record logic)
Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>