mirror of
https://github.com/wassname/pi-plan.git
synced 2026-06-27 17:01:38 +08:00
pi-goals: discriminator/failure-mode format + visible sign-off judge
Replace done_when with a discriminator + subtle-failure-mode pair as the heart of each goal. The discriminator is the POSITIVE success observation that no failure mode could fake, not just failure-avoidance: a run can dodge every trap and still produce nothing. Carried through planDrafting, the sign-off judge, README, and the parser doc. Format migration: flat numbered markdown goals (`1. [/] goal: ...`), keyword-anchored parsing (indentation cosmetic), goals matched by text, subtask states [ ]/[/]/[x]/[-] plus ~~strike~~. Evidence empty at planning, filled at sign-off, multi-line supported. CompleteGoal now returns the judge's reasoning under a `--- sign-off judge ---` block (was just "Signed off"), so the verdict is visible. Plan mode is read-only: edit/write (except goals.md) and mutating bash are blocked by a tool hook. 17 parser tests, typecheck + biome clean. Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
@@ -1,5 +1,6 @@
|
|||||||
node_modules/
|
node_modules/
|
||||||
dist/
|
dist/
|
||||||
*.log
|
*.log
|
||||||
|
.pi/
|
||||||
docs/reviews/raw.jsonl
|
docs/reviews/raw.jsonl
|
||||||
docs/reviews/err.txt
|
docs/reviews/err.txt
|
||||||
|
|||||||
@@ -1,17 +1,16 @@
|
|||||||
# pi-goals
|
# pi-goals
|
||||||
|
|
||||||
A [pi](https://github.com/badlogic/pi-mono) extension for plan-driven, goal-tracked work in one
|
Plan mode for agreeing on goals before any code gets written. Each goal names the subtle failure mode
|
||||||
`goals.md`. Set up goals (with evidence and failure modes) in plan mode, work them, and sign a goal
|
that could fake a "done" and the discriminator that tells real success from it, plus subtasks and the
|
||||||
off only when a read-only subagent has checked the evidence.
|
evidence that gets checked at sign-off. It all lives in one markdown file you can read and print. A
|
||||||
|
widget keeps the goals in front of you through compaction, a reminder nudges the agent to keep the
|
||||||
|
file current and work toward the goals on its own, and a goal is signed off only after a read-only
|
||||||
|
subagent has checked its evidence.
|
||||||
|
|
||||||
Successor to [pi-lgtm](https://github.com/wassname/pi-lgtm), kept deliberately small: about
|
It guides rather than guards. Like [pi-milestones](https://github.com/Neuron-Mr-White/UniPi/tree/main/packages/milestone)
|
||||||
[burneikis/pi-plan](https://github.com/burneikis/pi-plan) plus the additions, goals with evidence,
|
and [burneikis/pi-plan](https://github.com/burneikis/pi-plan), it leans on a form and a process to
|
||||||
a sign-off check, a widget, and a reminder.
|
steer the agent and trust its judgement. [pi-lgtm](https://github.com/wassname/pi-lgtm) was my earlier
|
||||||
|
attempt and got too complex; this one stays small and maintainable.
|
||||||
The form guides; it does not gate. The agent edits `goals.md` with its normal Edit tool. The one
|
|
||||||
blessed tool is `CompleteGoal`, which runs the sign-off check and records the result. The reminder,
|
|
||||||
the injected plan summary, and git/widget visibility carry the process. It trusts the agent's
|
|
||||||
judgement rather than guarding it.
|
|
||||||
|
|
||||||
## Install
|
## Install
|
||||||
|
|
||||||
@@ -19,7 +18,7 @@ judgement rather than guarding it.
|
|||||||
pi install npm:@wassname2/pi-goals
|
pi install npm:@wassname2/pi-goals
|
||||||
```
|
```
|
||||||
|
|
||||||
Or run without installing:
|
Or run it without installing:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pi -e npm:@wassname2/pi-goals
|
pi -e npm:@wassname2/pi-goals
|
||||||
@@ -28,78 +27,133 @@ pi -e npm:@wassname2/pi-goals
|
|||||||
## Use
|
## Use
|
||||||
|
|
||||||
```
|
```
|
||||||
/goals add CSV export to the report view
|
/goals CSV export for the report view
|
||||||
```
|
```
|
||||||
|
|
||||||
1. Plan. The agent explores read-only and writes goals into `goals.md` (see format below).
|
`/goals` enters plan mode and starts a conversation; the description is an optional seed, so plain
|
||||||
2. Review. You get a menu: Ready, Edit (ask the agent to revise), Open in `$EDITOR`, or Cancel.
|
`/goals` works too. From there:
|
||||||
On Ready you choose whether to keep the current context or start fresh and compacted.
|
|
||||||
3. Work. Each turn the active goal is injected (so it survives compaction) and a reminder nudges
|
|
||||||
the agent to keep `goals.md` current and work autonomously. When a goal's `done_when` is met the
|
|
||||||
agent calls `CompleteGoal`, which runs `verify` and a read-only judge and, on accept, marks it
|
|
||||||
done and logs it.
|
|
||||||
|
|
||||||
Other commands: `/goals` (print the goals), `/goals clear` (empty `goals.md`, history kept in git),
|
1. Plan. The agent explores read-only, asks about anything unclear, and writes the goals into
|
||||||
`/goals judge <model-ref>` (use a specific model for the sign-off judge; default is your current
|
`.pi/goals.md`.
|
||||||
model).
|
2. Review. You get a menu: Ready, Edit (ask the agent to revise), Open in `$EDITOR`, or Cancel. On
|
||||||
|
Ready you choose whether to keep the current context or start fresh and compacted.
|
||||||
|
3. Work. Each turn the active goal is injected so it survives compaction, and a reminder nudges the
|
||||||
|
agent to keep `goals.md` current and keep going. When a goal's discriminator is satisfied the agent
|
||||||
|
calls `CompleteGoal`, which runs `verify` and a read-only judge, then marks the goal done and logs it.
|
||||||
|
|
||||||
## goals.md format
|
Other commands: `/goals clear` empties `.pi/goals.md`; `/goals judge <model-ref>` picks a specific
|
||||||
|
model for the sign-off judge (the default is your current model).
|
||||||
|
|
||||||
One file holds the objective, the goals, and a short append-only log.
|
## Example
|
||||||
|
|
||||||
|
Start plan mode with an optional seed:
|
||||||
|
|
||||||
|
```
|
||||||
|
/goals audit the papers dir metadata and clean up empty dirs
|
||||||
|
```
|
||||||
|
|
||||||
|
The agent explores read-only, then drafts the goal with a subtle failure mode and the discriminator
|
||||||
|
that beats it, and stops for review:
|
||||||
|
|
||||||
```markdown
|
```markdown
|
||||||
# Goals: ship the cache layer
|
## Goals
|
||||||
|
|
||||||
## Goal: [/] Implement cache layer
|
1. [ ] goal: Audit steering/ metadata and remove empty dirs
|
||||||
<!-- id: cache-layer-1 -->
|
- subtle failure mode: report written but counts are zero (resolver errored silently)
|
||||||
done_when: p95 < 50ms on bench-X
|
- discriminator: report shows the XXXX count before/after AND a non-zero rename count
|
||||||
verify: pytest tests/cache -q && python bench/p95.py --max-ms 50
|
- tasks:
|
||||||
- [x] wire cache client
|
1. [ ] dry-run the metadata resolve
|
||||||
- [ ] eviction policy
|
2. [ ] remove the empty _artifacts dirs
|
||||||
|
3. [ ] write the report
|
||||||
|
- evidence:
|
||||||
|
- <empty until sign-off>
|
||||||
|
```
|
||||||
|
|
||||||
failure_modes:
|
You choose Ready. The agent works the subtasks, then fills `evidence` (each item an artifact plus a
|
||||||
- cache silently bypassed (hit-rate ~0, latency ok by luck)
|
short read of it) and calls `CompleteGoal`:
|
||||||
- bench too small to exercise eviction
|
|
||||||
evidence:
|
```markdown
|
||||||
- load-test.log p95=41ms; bench/p95.py exited 0
|
- evidence:
|
||||||
- cache hit-rate 0.93 in load-test.log (not bypassed)
|
- > scripts/metadata_report.txt: XXXX 52 -> 4, 146 empty _artifacts removed
|
||||||
|
- > 48 files renamed; almost certain done, the silent-resolver failure mode is ruled out
|
||||||
|
```
|
||||||
|
|
||||||
|
A fresh read-only subagent re-checks that evidence against the repo and the discriminator, then
|
||||||
|
returns its verdict and reasoning:
|
||||||
|
|
||||||
|
```
|
||||||
|
Signed off "Audit steering/ metadata and remove empty dirs". Marked done in goals.md.
|
||||||
|
|
||||||
|
--- sign-off judge ---
|
||||||
|
metadata_report.txt present; counts 52 -> 4 confirmed; rename log shows 48 renamed (not zero).
|
||||||
|
VERDICT: accept
|
||||||
|
```
|
||||||
|
|
||||||
|
## The goals.md format
|
||||||
|
|
||||||
|
One project-local file, `<cwd>/.pi/goals.md` (gitignored, like pi-tasks), holds the title, a context
|
||||||
|
block, the goals, and a short append-only log. A fresh `/goals` draft replaces it.
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# ship the cache layer
|
||||||
|
|
||||||
|
Latency target came from the SLO review; keep the existing client API.
|
||||||
|
|
||||||
|
## Goals
|
||||||
|
|
||||||
|
1. [/] goal: Implement cache layer
|
||||||
|
- subtle failure mode: cache silently bypassed, latency ok by luck
|
||||||
|
- discriminator: hit-rate > 0.8 in load-test.log (a bypass reads ~0)
|
||||||
|
- verify: pytest tests/cache -q && python bench/p95.py --max-ms 50
|
||||||
|
- tasks:
|
||||||
|
1. [x] wire cache client
|
||||||
|
2. [/] eviction policy
|
||||||
|
- evidence:
|
||||||
|
- > load-test.log: p95=41ms, hit-rate 0.93 (not bypassed)
|
||||||
|
|
||||||
|
# Future work / out of scope
|
||||||
|
|
||||||
|
- distributed cache
|
||||||
|
|
||||||
## Log
|
## Log
|
||||||
- 2026-06-15 14:02 cache client wired; eviction next
|
- 2026-06-15 14:02 cache client wired; eviction next
|
||||||
```
|
```
|
||||||
|
|
||||||
- A goal is a `## Goal:` header whose checkbox carries its state (`[ ]` open, `[/]` active, `[x]`
|
- A goal is a numbered checkbox line beginning `goal:`; the checkbox carries its state (`[ ]` open,
|
||||||
done, `[-]` cancelled), then an `<!-- id -->`, one falsifiable `done_when:`, an optional `verify:`
|
`[/]` active, `[x]` done, `[-]` cancelled). Goals are matched by their text, so the number is just
|
||||||
shell command, `- [ ]` subtasks, an optional short `failure_modes:` pre-mortem list, and an
|
for you to reference.
|
||||||
`evidence:` list.
|
- The `discriminator` is the success test, written while planning: the positive observation that the
|
||||||
- `done_when` is the test, written at planning. `evidence` is the proof, a `- ` list the agent fills
|
goal actually succeeded and that none of the `subtle failure mode`s could fake. It has to show
|
||||||
at completion pointing at durable artifacts; `CompleteGoal` reads it from the file. `failure_modes`
|
something happened (a count moved, a test exercised the path, a metric beat noise), not just that a
|
||||||
is the pre-mortem. `verify`, when present, is the deterministic first stage of the sign-off.
|
failure was avoided. `evidence` is the proof, filled at sign-off:
|
||||||
- The agent ticks subtasks, appends to `## Log`, and sets the header checkbox (`[/]` when it starts
|
each item pairs a durable artifact (a quoted and linked log, a table, a metric) with a short read of
|
||||||
a goal) as it works. Only `CompleteGoal` writes `[x]`. Multiple goals may be active.
|
it, not a bare claim. `verify`, when present, is the deterministic first stage of the sign-off.
|
||||||
|
- Subtasks are any checkbox without a `goal:` prefix, under `- tasks:` (`[/]` in progress, `[-]`
|
||||||
|
cancelled). The agent ticks them, appends to `## Log`, and sets a goal `[/]` when it starts it. Only
|
||||||
|
`CompleteGoal` writes `[x]`. Several goals can be active at once.
|
||||||
|
|
||||||
## The sign-off check (`CompleteGoal`)
|
## Signing off a goal (`CompleteGoal`)
|
||||||
|
|
||||||
`CompleteGoal(goal_id)` is the one blessed completion path. It reads the goal's `evidence:` block
|
`CompleteGoal(goal)` (matched by the goal's text) is the only tool that marks a goal done; everything
|
||||||
from goals.md (so the proof is git-tracked and human-reviewable before sign-off, not buried in a tool
|
else is the agent editing the file. It reads the goal's `evidence:` block from `.pi/goals.md`, so the
|
||||||
call):
|
proof stays in the file where you can review it, then:
|
||||||
|
|
||||||
1. If the goal has a `verify:` command, it is run. A non-zero exit rejects immediately, with no model
|
1. If the goal has a `verify:` command, it runs. A non-zero exit rejects right away, with no model
|
||||||
call.
|
call.
|
||||||
2. Otherwise a read-only `pi` subprocess (the judge) inspects the `evidence:` items against the repo
|
2. Otherwise a read-only `pi` subprocess (a fresh `--no-session` context, so it never sees the working
|
||||||
and the named failure modes and returns a verdict. It re-derives from the artifacts the evidence
|
agent's transcript) inspects the `evidence:` against the repo, the `discriminator`, and the
|
||||||
points at rather than trusting the claim, so the `evidence:` list should name durable artifacts
|
`subtle failure mode`, and returns a verdict. It re-derives from the cited artifacts rather than
|
||||||
(saved logs, committed diffs, files).
|
trusting the claim, so list real artifacts, not assertions.
|
||||||
3. On accept, the goal's header checkbox flips to `[x]` and a `## Log` line is written. On reject,
|
3. On accept, the goal flips to `[x]` and a `## Log` line is written. On reject, the goal stays open
|
||||||
the goal stays open and the agent is told what is missing.
|
and the agent is told what is missing. Either way the judge's reasoning comes back in the result.
|
||||||
|
|
||||||
The judge defaults to your current model (guaranteed authorized and capable). Set a different one
|
The judge defaults to your current model (a fresh context, same weights). Point it at another with
|
||||||
with `/goals judge <provider/model>` for an independent cross-family check.
|
`/goals judge <provider/model>` for an independent cross-family check.
|
||||||
|
|
||||||
## Prompts
|
## Prompts
|
||||||
|
|
||||||
All model-facing text lives in [`src/prompts.ts`](src/prompts.ts), in flow order, so the process is
|
All model-facing text lives in [`src/prompts.ts`](src/prompts.ts), in flow order, so you can read the
|
||||||
easy to review end to end.
|
whole process top to bottom.
|
||||||
|
|
||||||
## Develop
|
## Develop
|
||||||
|
|
||||||
@@ -112,8 +166,9 @@ npm run lint
|
|||||||
|
|
||||||
## Not (yet) included
|
## Not (yet) included
|
||||||
|
|
||||||
No autonomous re-prompt loop (an until-done-style loop judge). Autonomy comes from the reminder, not
|
- No autonomous re-prompt loop. The reminder nudges the agent within a turn, but the turn still ends
|
||||||
a harness. Plan-phase model stickiness is a documented next step.
|
and hands back to you; nothing auto-re-prompts until the goals are done.
|
||||||
|
- The plan and execution phases can't yet run on different, sticky models.
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
|
|||||||
+1
-1
@@ -1,7 +1,7 @@
|
|||||||
{
|
{
|
||||||
"name": "@wassname2/pi-goals",
|
"name": "@wassname2/pi-goals",
|
||||||
"version": "0.0.1",
|
"version": "0.0.1",
|
||||||
"description": "One goals.md: set goals in plan mode, work them, sign off only when a read-only check passes. Successor to pi-lgtm.",
|
"description": "One .pi/goals.md: set goals in plan mode, work them, sign off only when a read-only check passes. Successor to pi-lgtm.",
|
||||||
"author": "wassname",
|
"author": "wassname",
|
||||||
"license": "MIT",
|
"license": "MIT",
|
||||||
"type": "module",
|
"type": "module",
|
||||||
|
|||||||
+130
-63
@@ -1,7 +1,9 @@
|
|||||||
/**
|
/**
|
||||||
* pi-goals — plan mode that sets up goals with evidence, tracked in one goals.md, signed off by a
|
* pi-goals — plan mode that sets up goals with evidence, tracked in one .pi/goals.md, signed off by a
|
||||||
* read-only subagent check. A successor to pi-lgtm, kept deliberately small (≈ burneikis/pi-plan
|
* read-only subagent check. A successor to pi-lgtm, kept deliberately small (≈ burneikis/pi-plan
|
||||||
* plus the additions: goals + failure_modes + subtasks, a sign-off check, a widget, a reminder).
|
* plus the additions: goals + a discriminator + a subtle failure mode + subtasks, a sign-off check,
|
||||||
|
* a widget, a reminder). A goal's success test is its discriminator: the observation that tells real
|
||||||
|
* success from the named failure mode.
|
||||||
*
|
*
|
||||||
* Philosophy (spec D3): the form guides, it does not gate. The agent edits goals.md with its normal
|
* Philosophy (spec D3): the form guides, it does not gate. The agent edits goals.md with its normal
|
||||||
* Edit tool. The one blessed tool is CompleteGoal, which runs the sign-off check and records it. The
|
* Edit tool. The one blessed tool is CompleteGoal, which runs the sign-off check and records it. The
|
||||||
@@ -9,19 +11,30 @@
|
|||||||
* judgement rather than guarding it.
|
* judgement rather than guarding it.
|
||||||
*
|
*
|
||||||
* Flow:
|
* Flow:
|
||||||
* /goals <objective> -> plan mode: agent explores, drafts goals into goals.md (planDrafting guides)
|
* /goals [objective] -> plan mode (conversational): objective is an optional seed; agent explores
|
||||||
|
* read-only, asks, then drafts goals into .pi/goals.md (planDrafting guides)
|
||||||
* agent_end -> review menu (Ready / Edit / $EDITOR / Cancel); Ready offers compaction
|
* agent_end -> review menu (Ready / Edit / $EDITOR / Cancel); Ready offers compaction
|
||||||
* execution -> each turn, inject the plan summary (survives compaction) + a reminder;
|
* execution -> each turn, inject the plan summary (survives compaction) + a reminder;
|
||||||
* agent works goals, ticks subtasks, appends ## Log, calls CompleteGoal
|
* agent works goals, ticks subtasks, appends ## Log, calls CompleteGoal
|
||||||
* CompleteGoal -> optional deterministic verify, then a read-only oracle judge -> accept
|
* CompleteGoal -> optional deterministic verify, then a read-only oracle judge -> accept
|
||||||
* flips status:done + logs; reject returns what's missing
|
* flips status:done + logs; reject returns what's missing
|
||||||
*
|
*
|
||||||
* All model-facing text lives in prompts.tsx, in flow order.
|
* The plan file lives at <cwd>/.pi/goals.md (project-local, gitignored, like pi-tasks), not in the
|
||||||
|
* repo. A fresh /goals draft just replaces it (the "overwrite" staleness rule).
|
||||||
|
*
|
||||||
|
* Plan mode is read-only: the tool_call hook blocks edit/write (except goals.md itself) and mutating
|
||||||
|
* bash while drafting, so code isn't written before the goals are agreed. Read-only bash exploration
|
||||||
|
* stays open (blocklist, not allowlist).
|
||||||
|
*
|
||||||
|
* Not built (FIXME): no plan-vs-exec model switch on accept (plan-model stickiness); noted at its
|
||||||
|
* call site below.
|
||||||
|
*
|
||||||
|
* All model-facing text lives in prompts.ts, in flow order.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
import { spawn, spawnSync } from "node:child_process";
|
import { spawn, spawnSync } from "node:child_process";
|
||||||
import { existsSync, readFileSync, writeFileSync } from "node:fs";
|
import { existsSync, mkdirSync, readFileSync, writeFileSync } from "node:fs";
|
||||||
import { basename, join } from "node:path";
|
import { basename, join, resolve } from "node:path";
|
||||||
import type { ExtensionAPI, ExtensionCommandContext, ExtensionContext } from "@earendil-works/pi-coding-agent";
|
import type { ExtensionAPI, ExtensionCommandContext, ExtensionContext } from "@earendil-works/pi-coding-agent";
|
||||||
import { Type } from "@sinclair/typebox";
|
import { Type } from "@sinclair/typebox";
|
||||||
import { counts, findGoal, type Goal, type PlanDoc, parse, recordSignOff, type SignOff } from "./plan-file.js";
|
import { counts, findGoal, type Goal, type PlanDoc, parse, recordSignOff, type SignOff } from "./plan-file.js";
|
||||||
@@ -32,6 +45,29 @@ const PLAN_CONTEXT = "pi-goals-context"; // injected plan-mode guidance, strippe
|
|||||||
const STATUS_KEY = "pi-goals";
|
const STATUS_KEY = "pi-goals";
|
||||||
const WIDGET_KEY = "pi-goals-widget";
|
const WIDGET_KEY = "pi-goals-widget";
|
||||||
const READ_ONLY_TOOLS = ["read", "grep", "find", "ls", "bash"];
|
const READ_ONLY_TOOLS = ["read", "grep", "find", "ls", "bash"];
|
||||||
|
// File mutators blocked while drafting goals (read-only plan mode, like narumiruna/pi-plan-mode), so
|
||||||
|
// code isn't written before goals are agreed. The one allowed write is goals.md itself (the
|
||||||
|
// deliverable). A read-only task (a pure search) can still be explored in plan mode by nature.
|
||||||
|
const PLAN_MODE_BLOCKED_TOOLS = ["edit", "write"];
|
||||||
|
// bash is dual-use, so block it only when the command looks mutating; read-only exploration (cat, rg,
|
||||||
|
// git log, running a script to inspect) stays open. Blocklist, not allowlist: keep exploration
|
||||||
|
// frictionless and just stop the obvious mutators. List adapted from narumiruna/pi-plan-mode; the
|
||||||
|
// redirect rule catches `> file` / `>> file` / `>| file` but not fd-dups like `2>&1` or `>&2`.
|
||||||
|
const MUTATING_BASH_PATTERNS: RegExp[] = [
|
||||||
|
/\b(rm|rmdir|mv|cp|mkdir|touch|chmod|chown|chgrp|ln|tee|truncate|dd)\b/i,
|
||||||
|
/>\s*[^&\s]/, // redirect to a file (write/append/clobber), excludes 2>&1 and >&2
|
||||||
|
/\bnpm\s+(install|uninstall|update|ci|link|publish|version)\b/i,
|
||||||
|
/\byarn\s+(add|remove|install|publish|upgrade)\b/i,
|
||||||
|
/\bpnpm\s+(add|remove|install|publish|update)\b/i,
|
||||||
|
/\bbun\s+(add|remove|install|update|publish)\b/i,
|
||||||
|
/\bpip\s+(install|uninstall)\b/i,
|
||||||
|
/\buv\s+(add|remove|sync|lock|pip\s+install)\b/i,
|
||||||
|
/\bgit\s+(add|commit|push|pull|merge|rebase|reset|checkout|switch|stash|cherry-pick|revert|tag|init|clone)\b/i,
|
||||||
|
/\b(sudo|su|kill|pkill|killall|reboot|shutdown)\b/i,
|
||||||
|
/\bsystemctl\s+(start|stop|restart|enable|disable)\b/i,
|
||||||
|
/\b(vim?|nano|emacs|code|subl)\b/i,
|
||||||
|
];
|
||||||
|
const PLAN_REL = ".pi/goals.md"; // project-local, gitignored (pi-tasks convention); shown in the widget
|
||||||
|
|
||||||
interface PlanState {
|
interface PlanState {
|
||||||
isPlanMode: boolean;
|
isPlanMode: boolean;
|
||||||
@@ -47,8 +83,14 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
|
|||||||
// newSession is only on the command-handler context; agent_end's ctx lacks it. Save it from /goals.
|
// newSession is only on the command-handler context; agent_end's ctx lacks it. Save it from /goals.
|
||||||
let savedCmdCtx: ExtensionCommandContext | null = null;
|
let savedCmdCtx: ExtensionCommandContext | null = null;
|
||||||
|
|
||||||
const planPath = (ctx: ExtensionContext) => join(ctx.cwd, "goals.md");
|
const planPath = (ctx: ExtensionContext) => join(ctx.cwd, ".pi", "goals.md");
|
||||||
const readPlan = (ctx: ExtensionContext): string => (existsSync(planPath(ctx)) ? readFileSync(planPath(ctx), "utf-8") : "");
|
const readPlan = (ctx: ExtensionContext): string => (existsSync(planPath(ctx)) ? readFileSync(planPath(ctx), "utf-8") : "");
|
||||||
|
// Our programmatic writes (clear, CompleteGoal). The agent creates/edits the file with its own Edit
|
||||||
|
// tool; this just makes sure .pi/ exists for our writes.
|
||||||
|
const writePlan = (ctx: ExtensionContext, content: string): void => {
|
||||||
|
mkdirSync(join(ctx.cwd, ".pi"), { recursive: true });
|
||||||
|
writeFileSync(planPath(ctx), content);
|
||||||
|
};
|
||||||
|
|
||||||
function persist(): void {
|
function persist(): void {
|
||||||
pi.appendEntry<PlanState>(STATE, state);
|
pi.appendEntry<PlanState>(STATE, state);
|
||||||
@@ -57,7 +99,7 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
|
|||||||
function updateWidget(ctx: ExtensionContext): void {
|
function updateWidget(ctx: ExtensionContext): void {
|
||||||
if (state.isPlanMode) {
|
if (state.isPlanMode) {
|
||||||
ctx.ui.setStatus(STATUS_KEY, ctx.ui.theme.fg("warning", "planning"));
|
ctx.ui.setStatus(STATUS_KEY, ctx.ui.theme.fg("warning", "planning"));
|
||||||
ctx.ui.setWidget(WIDGET_KEY, ["pi-goals: drafting goals", "Write goals to goals.md, then review."]);
|
ctx.ui.setWidget(WIDGET_KEY, ["pi-goals: drafting goals", `Write goals to ${PLAN_REL}, then review.`]);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
const doc = parse(readPlan(ctx));
|
const doc = parse(readPlan(ctx));
|
||||||
@@ -68,17 +110,17 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
|
|||||||
}
|
}
|
||||||
const c = counts(doc);
|
const c = counts(doc);
|
||||||
ctx.ui.setStatus(STATUS_KEY, ctx.ui.theme.fg("accent", `◷ ${c.done}/${doc.goals.length} goals`));
|
ctx.ui.setStatus(STATUS_KEY, ctx.ui.theme.fg("accent", `◷ ${c.done}/${doc.goals.length} goals`));
|
||||||
ctx.ui.setWidget(WIDGET_KEY, goalWidgetLines(doc));
|
ctx.ui.setWidget(WIDGET_KEY, [...goalWidgetLines(doc), ctx.ui.theme.fg("muted", PLAN_REL)]);
|
||||||
}
|
}
|
||||||
|
|
||||||
function goalWidgetLines(doc: PlanDoc): string[] {
|
function goalWidgetLines(doc: PlanDoc): string[] {
|
||||||
const mark: Record<Goal["status"], string> = { done: "✔", active: "▸", open: "◻", cancelled: "✗" };
|
const mark: Record<Goal["status"], string> = { done: "✔", active: "▸", open: "◻", cancelled: "✗" };
|
||||||
const lines = [`Goals: ${doc.objective || "(untitled)"}`];
|
const lines = [`Goals: ${doc.title || "(untitled)"}`];
|
||||||
for (const g of doc.goals) {
|
for (const g of doc.goals) {
|
||||||
// Show every goal with its status glyph (✔ done, ▸ active, ◻ open, ✗ cancelled) so finished
|
// Show every goal with its status glyph (✔ done, ▸ active, ◻ open, ✗ cancelled) so finished
|
||||||
// goals read as checked off rather than vanishing. Plans are small, so this stays readable.
|
// goals read as checked off rather than vanishing. Plans are small, so this stays readable.
|
||||||
const total = g.subtasks.length;
|
const total = g.subtasks.length;
|
||||||
const done = g.subtasks.filter((s) => s.done).length;
|
const done = g.subtasks.filter((s) => s.status === "done").length;
|
||||||
lines.push(`${mark[g.status]} ${g.subject}${total ? ` (${done}/${total} tasks)` : ""}`);
|
lines.push(`${mark[g.status]} ${g.subject}${total ? ` (${done}/${total} tasks)` : ""}`);
|
||||||
}
|
}
|
||||||
return lines;
|
return lines;
|
||||||
@@ -99,24 +141,18 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
|
|||||||
setJudge(arg.slice("judge".length).trim(), ctx);
|
setJudge(arg.slice("judge".length).trim(), ctx);
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
// Bare `/goals` enters plan mode by prompting for the objective (the common expectation).
|
// Conversational entry (like narumiruna/pi-plan-mode): /goals enters plan mode and starts a
|
||||||
// If the user cancels with no objective, fall back to showing the current plan.
|
// dialogue. The objective is an optional seed, not a required arg, so there's no awkward
|
||||||
let objective = arg;
|
// "type your objective" prompt; the agent explores read-only and asks before drafting. A
|
||||||
if (!objective) {
|
// fresh draft just replaces .pi/goals.md (the "overwrite" staleness rule).
|
||||||
objective = (ctx.hasUI ? await ctx.ui.input("Plan mode — what's the objective?", "Describe what you want to plan") : undefined)?.trim() ?? "";
|
const objective = arg || null;
|
||||||
if (!objective) {
|
|
||||||
showPlan(ctx);
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
state = { ...state, isPlanMode: true, objective };
|
state = { ...state, isPlanMode: true, objective };
|
||||||
persist();
|
persist();
|
||||||
updateWidget(ctx);
|
updateWidget(ctx);
|
||||||
pi.sendUserMessage(
|
const seed = objective
|
||||||
`Enter plan mode for this objective: ${objective}\n\nExplore read-only, then write the plan to ${planPath(ctx)}.`,
|
? `We're in plan mode. Objective: ${objective}\n\nExplore the repo read-only and ask me anything unclear. When the objective is nailed down, draft (or replace) the goals in ${planPath(ctx)}, then stop for review.`
|
||||||
{ deliverAs: "followUp" },
|
: `We're in plan mode. Tell me what you want to plan. Explore read-only and ask questions as needed; when the objective is clear, draft the goals in ${planPath(ctx)} and stop for review.`;
|
||||||
);
|
pi.sendUserMessage(seed, { deliverAs: "followUp" });
|
||||||
},
|
},
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -132,23 +168,14 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
|
|||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
if (ctx.hasUI) {
|
if (ctx.hasUI) {
|
||||||
const ok = await ctx.ui.select("Clear goals.md? (it stays in git history)", ["Cancel", "Clear goals.md"]);
|
const ok = await ctx.ui.select(`Clear ${PLAN_REL}?`, ["Cancel", "Clear goals.md"]);
|
||||||
if (ok !== "Clear goals.md") return;
|
if (ok !== "Clear goals.md") return;
|
||||||
}
|
}
|
||||||
writeFileSync(planPath(ctx), "");
|
writePlan(ctx, "");
|
||||||
state = { ...state, isPlanMode: false, objective: null };
|
state = { ...state, isPlanMode: false, objective: null };
|
||||||
persist();
|
persist();
|
||||||
updateWidget(ctx);
|
updateWidget(ctx);
|
||||||
ctx.ui.notify("Cleared goals.md.", "info");
|
ctx.ui.notify(`Cleared ${PLAN_REL}.`, "info");
|
||||||
}
|
|
||||||
|
|
||||||
function showPlan(ctx: ExtensionContext): void {
|
|
||||||
const content = readPlan(ctx);
|
|
||||||
if (!content.trim()) {
|
|
||||||
ctx.ui.notify("No goals yet. Use /goals <objective> to start.", "info");
|
|
||||||
return;
|
|
||||||
}
|
|
||||||
ctx.ui.notify(content, "info");
|
|
||||||
}
|
}
|
||||||
|
|
||||||
// --- review loop (after the agent drafts the plan) --------------------------------------------
|
// --- review loop (after the agent drafts the plan) --------------------------------------------
|
||||||
@@ -190,6 +217,9 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
|
|||||||
}
|
}
|
||||||
|
|
||||||
async function startExecution(ctx: ExtensionContext): Promise<void> {
|
async function startExecution(ctx: ExtensionContext): Promise<void> {
|
||||||
|
// FIXME(model-switch): the plan phase should be able to run on a sticky plan model and execution
|
||||||
|
// on a different one (see README "Not yet included"). newSession can't switch the model yet; wire
|
||||||
|
// this when pi exposes a model override on newSession.
|
||||||
// Offer a clean execution context (D13). newSession lives only on the saved command context.
|
// Offer a clean execution context (D13). newSession lives only on the saved command context.
|
||||||
let fresh = false;
|
let fresh = false;
|
||||||
if (ctx.hasUI && savedCmdCtx) {
|
if (ctx.hasUI && savedCmdCtx) {
|
||||||
@@ -203,7 +233,7 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
|
|||||||
const planFile = planPath(ctx);
|
const planFile = planPath(ctx);
|
||||||
const planContent = readPlan(ctx); // captured now: ctx is stale after newSession below
|
const planContent = readPlan(ctx); // captured now: ctx is stale after newSession below
|
||||||
const parentSession = ctx.sessionManager.getSessionFile();
|
const parentSession = ctx.sessionManager.getSessionFile();
|
||||||
const startMsg = `Work the goals in ${planFile}. Pick an open goal, mark it active (set its header to [/]), work its subtasks, and when its done_when is met fill the goal's evidence: block then call CompleteGoal with the goal_id. Keep goals.md current as you go.`;
|
const startMsg = `Work the goals in ${planFile}. Pick an open goal, mark it active (set its checkbox to [/]), work its subtasks, and when its discriminator is satisfied fill the goal's evidence: block then call CompleteGoal with the goal's desc. Keep goals.md current as you go.`;
|
||||||
exitPlanMode(ctx);
|
exitPlanMode(ctx);
|
||||||
|
|
||||||
if (fresh && savedCmdCtx) {
|
if (fresh && savedCmdCtx) {
|
||||||
@@ -223,7 +253,7 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
|
|||||||
}
|
}
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
if (doc.objective) pi.setSessionName(`Goals: ${doc.objective}`);
|
if (doc.title) pi.setSessionName(`Goals: ${doc.title}`);
|
||||||
ctx.ui.notify(planContent, "info");
|
ctx.ui.notify(planContent, "info");
|
||||||
pi.sendUserMessage(startMsg, { deliverAs: "followUp" });
|
pi.sendUserMessage(startMsg, { deliverAs: "followUp" });
|
||||||
}
|
}
|
||||||
@@ -234,29 +264,33 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
|
|||||||
name: "CompleteGoal",
|
name: "CompleteGoal",
|
||||||
label: "Complete goal",
|
label: "Complete goal",
|
||||||
description:
|
description:
|
||||||
"Sign off a goal once its done_when is met. First fill the goal's evidence: block in goals.md " +
|
"Sign off a goal once its discriminator is satisfied. First fill the goal's evidence: block in " +
|
||||||
"(a '- ' list pointing at durable artifacts: saved logs, committed diffs, files, not claims), then " +
|
"goals.md: a list where each item pairs a durable artifact with a short read of it (a quoted+linked " +
|
||||||
"call this with the goal_id. Runs the goal's verify command (if any) then a read-only subagent that " +
|
"log, a table plus how to read it, or a metric plus what it shows; quote the key lines and link the " +
|
||||||
"inspects that evidence against the repo. On accept, the goal is marked done and logged; on reject, " +
|
"rest, not a pasted blob or a bare claim). Then call this with the goal's desc (the text after " +
|
||||||
"it stays open and you get what is missing.",
|
"'goal:'). Runs the goal's verify command (if any) then a read-only subagent that inspects that " +
|
||||||
|
"evidence against the repo and the discriminator. On accept, the goal is marked done and logged; on " +
|
||||||
|
"reject, it stays open and you get what is missing. The subagent's reasoning is returned either way.",
|
||||||
parameters: Type.Object({
|
parameters: Type.Object({
|
||||||
goal_id: Type.String({ description: "The goal's <!-- id --> from goals.md" }),
|
goal: Type.String({ description: "The goal's desc: the exact text after 'goal:' in its line." }),
|
||||||
}),
|
}),
|
||||||
async execute(_id, params, signal, _onUpdate, ctx) {
|
async execute(_id, params, signal, _onUpdate, ctx) {
|
||||||
const content = readPlan(ctx);
|
const content = readPlan(ctx);
|
||||||
const goal = findGoal(parse(content), params.goal_id);
|
const goal = findGoal(parse(content), params.goal);
|
||||||
if (!goal) return text(`No goal #${params.goal_id} in goals.md.`, true);
|
if (!goal) return text(`No goal "${params.goal}" in goals.md. Use the exact text after "goal:".`, true);
|
||||||
if (goal.evidence.length === 0) {
|
if (goal.evidence.length === 0) {
|
||||||
return text(`Goal #${goal.id} has no evidence: block. Add a "- " evidence list to the goal in goals.md (what shows done_when is met, and where to verify it), then call CompleteGoal.`, true);
|
return text(`Goal "${goal.subject}" has no evidence yet. Add an evidence: list to the goal in goals.md (artifacts + a short read showing the discriminator is satisfied), then call CompleteGoal.`, true);
|
||||||
}
|
}
|
||||||
|
|
||||||
// Decide the outcome (the I/O); recordSignOff applies it to the file (the pure write).
|
// Decide the outcome (the I/O); recordSignOff applies it to the file (the pure write).
|
||||||
// Evidence and the artifacts to inspect both come from the goal's evidence: block (single source of truth).
|
// Evidence and the artifacts to inspect both come from the goal's evidence: block (single source of truth).
|
||||||
const outcome = await decideSignOff(goal, goal.evidence.join("\n"), goal.evidence, state.judgeModel, ctx.cwd, signal);
|
const { outcome, reasoning } = await decideSignOff(goal, goal.evidence.join("\n"), goal.evidence, state.judgeModel, ctx.cwd, signal);
|
||||||
const res = recordSignOff(content, goal.id, stamp(), outcome);
|
const res = recordSignOff(content, goal.subject, stamp(), outcome);
|
||||||
if (res.content !== content) writeFileSync(planPath(ctx), res.content);
|
if (res.content !== content) writePlan(ctx, res.content);
|
||||||
updateWidget(ctx);
|
updateWidget(ctx);
|
||||||
return text(res.message, res.isError);
|
// Surface the sign-off judge's actual reasoning, not just the verdict, so it's visible (was a gap).
|
||||||
|
const detail = reasoning ? `\n\n--- sign-off judge ---\n${reasoning}` : "";
|
||||||
|
return text(res.message + detail, res.isError);
|
||||||
},
|
},
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -264,6 +298,7 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
|
|||||||
|
|
||||||
pi.on("before_agent_start", async (_event, ctx) => {
|
pi.on("before_agent_start", async (_event, ctx) => {
|
||||||
if (state.isPlanMode) {
|
if (state.isPlanMode) {
|
||||||
|
// Read-only is enforced in the tool_call hook below (blocks edit/write while planning).
|
||||||
return { message: { customType: PLAN_CONTEXT, content: `${planDrafting}\n\nWrite the plan to ${planPath(ctx)}.`, display: false } };
|
return { message: { customType: PLAN_CONTEXT, content: `${planDrafting}\n\nWrite the plan to ${planPath(ctx)}.`, display: false } };
|
||||||
}
|
}
|
||||||
const doc = parse(readPlan(ctx));
|
const doc = parse(readPlan(ctx));
|
||||||
@@ -272,9 +307,13 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
|
|||||||
const active = doc.goals.find((g) => g.status === "active") ?? doc.goals.find((g) => g.status === "open") ?? null;
|
const active = doc.goals.find((g) => g.status === "active") ?? doc.goals.find((g) => g.status === "open") ?? null;
|
||||||
const c = counts(doc);
|
const c = counts(doc);
|
||||||
let body = planInjection({
|
let body = planInjection({
|
||||||
objective: doc.objective,
|
title: doc.title,
|
||||||
activeGoal: active
|
activeGoal: active
|
||||||
? { subject: active.subject, done_when: active.done_when, openSubtasks: active.subtasks.filter((s) => !s.done).map((s) => s.text) }
|
? {
|
||||||
|
subject: active.subject,
|
||||||
|
discriminator: active.discriminator,
|
||||||
|
openSubtasks: active.subtasks.filter((s) => s.status !== "done" && s.status !== "cancelled").map((s) => s.text),
|
||||||
|
}
|
||||||
: null,
|
: null,
|
||||||
lastLogLine: doc.log.at(-1) ?? null,
|
lastLogLine: doc.log.at(-1) ?? null,
|
||||||
counts: { done: c.done, open: c.open + c.active },
|
counts: { done: c.done, open: c.open + c.active },
|
||||||
@@ -286,6 +325,25 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
|
|||||||
return { message: { customType: PLAN_CONTEXT, content: body, display: false } };
|
return { message: { customType: PLAN_CONTEXT, content: body, display: false } };
|
||||||
});
|
});
|
||||||
|
|
||||||
|
// Enforce read-only planning: block file mutators while in plan mode so code isn't written before
|
||||||
|
// the goals are agreed. The agent draws back to read/grep/find/ls and read-only bash to explore.
|
||||||
|
pi.on("tool_call", async (event, ctx) => {
|
||||||
|
if (!state.isPlanMode) return;
|
||||||
|
// edit/write: blocked, except writing goals.md itself (the deliverable of plan mode).
|
||||||
|
if (PLAN_MODE_BLOCKED_TOOLS.includes(event.toolName)) {
|
||||||
|
const target = (event.input as { path?: string }).path;
|
||||||
|
if (target && resolve(ctx.cwd, target) === resolve(planPath(ctx))) return;
|
||||||
|
return { block: true, reason: `Plan mode is read-only: agree the goals in ${PLAN_REL} and choose Ready before writing code (${event.toolName} is blocked while planning; only ${PLAN_REL} may be written).` };
|
||||||
|
}
|
||||||
|
// bash: blocked only when the command looks mutating; read-only exploration stays open.
|
||||||
|
if (event.toolName === "bash") {
|
||||||
|
const command = (event.input as { command?: string }).command ?? "";
|
||||||
|
if (MUTATING_BASH_PATTERNS.some((re) => re.test(command))) {
|
||||||
|
return { block: true, reason: `Plan mode is read-only: this bash command looks like it mutates state, so it's blocked while planning. Explore read-only, agree the goals in ${PLAN_REL}, then choose Ready.\nCommand: ${command}` };
|
||||||
|
}
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
pi.on("agent_end", async (_event, ctx) => {
|
pi.on("agent_end", async (_event, ctx) => {
|
||||||
if (!state.isPlanMode || !ctx.hasUI) return;
|
if (!state.isPlanMode || !ctx.hasUI) return;
|
||||||
const doc = parse(readPlan(ctx));
|
const doc = parse(readPlan(ctx));
|
||||||
@@ -327,7 +385,8 @@ function stamp(): string {
|
|||||||
return new Date().toISOString().slice(0, 16).replace("T", " ");
|
return new Date().toISOString().slice(0, 16).replace("T", " ");
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Decide a sign-off: deterministic verify first (cheap; skip the model call if it fails), then the judge. */
|
/** Decide a sign-off: deterministic verify first (cheap; skip the model call if it fails), then the judge.
|
||||||
|
* Returns the outcome plus the judge's (or verify's) reasoning so CompleteGoal can show WHY. */
|
||||||
async function decideSignOff(
|
async function decideSignOff(
|
||||||
goal: Goal,
|
goal: Goal,
|
||||||
evidence: string,
|
evidence: string,
|
||||||
@@ -335,16 +394,20 @@ async function decideSignOff(
|
|||||||
judgeModel: string | null,
|
judgeModel: string | null,
|
||||||
cwd: string,
|
cwd: string,
|
||||||
signal: AbortSignal | undefined,
|
signal: AbortSignal | undefined,
|
||||||
): Promise<SignOff> {
|
): Promise<{ outcome: SignOff; reasoning: string }> {
|
||||||
let verifyResult: { command: string; exitCode: number; outputTail: string } | null = null;
|
let verifyResult: { command: string; exitCode: number; outputTail: string } | null = null;
|
||||||
if (goal.verify) {
|
if (goal.verify) {
|
||||||
verifyResult = runVerify(goal.verify, cwd, signal);
|
verifyResult = runVerify(goal.verify, cwd, signal);
|
||||||
if (verifyResult.exitCode !== 0) {
|
if (verifyResult.exitCode !== 0) {
|
||||||
return { kind: "verify_failed", exitCode: verifyResult.exitCode, outputTail: verifyResult.outputTail };
|
return {
|
||||||
|
outcome: { kind: "verify_failed", exitCode: verifyResult.exitCode, outputTail: verifyResult.outputTail },
|
||||||
|
reasoning: `verify \`${goal.verify}\` exited ${verifyResult.exitCode}:\n${verifyResult.outputTail}`,
|
||||||
|
};
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
const verdict = await runJudge(goal, evidence, paths, verifyResult, judgeModel, cwd, signal);
|
const verdict = await runJudge(goal, evidence, paths, verifyResult, judgeModel, cwd, signal);
|
||||||
return verdict.accept ? { kind: "accepted" } : { kind: "rejected", missing: verdict.missing };
|
const outcome: SignOff = verdict.accept ? { kind: "accepted" } : { kind: "rejected", missing: verdict.missing };
|
||||||
|
return { outcome, reasoning: verdict.reasoning };
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Run the goal's verify command. It is agent-authored and trusted (single-user machine, guide-not-guard). */
|
/** Run the goal's verify command. It is agent-authored and trusted (single-user machine, guide-not-guard). */
|
||||||
@@ -372,13 +435,13 @@ async function runJudge(
|
|||||||
judgeModel: string | null,
|
judgeModel: string | null,
|
||||||
cwd: string,
|
cwd: string,
|
||||||
signal: AbortSignal | undefined,
|
signal: AbortSignal | undefined,
|
||||||
): Promise<{ accept: boolean; missing: string }> {
|
): Promise<{ accept: boolean; missing: string; reasoning: string }> {
|
||||||
const task = evidenceJudgeUser({
|
const task = evidenceJudgeUser({
|
||||||
subject: goal.subject,
|
subject: goal.subject,
|
||||||
done_when: goal.done_when,
|
discriminator: goal.discriminator,
|
||||||
|
failure_modes: goal.failure_modes,
|
||||||
verify: goal.verify ?? null,
|
verify: goal.verify ?? null,
|
||||||
verifyResult,
|
verifyResult,
|
||||||
failure_modes: goal.failure_modes,
|
|
||||||
evidence,
|
evidence,
|
||||||
paths,
|
paths,
|
||||||
});
|
});
|
||||||
@@ -403,5 +466,9 @@ async function runJudge(
|
|||||||
const accept = /accept/i.test(verdictLine);
|
const accept = /accept/i.test(verdictLine);
|
||||||
const missingMatch = clean.match(/missing\s*:\s*([\s\S]*)$/i);
|
const missingMatch = clean.match(/missing\s*:\s*([\s\S]*)$/i);
|
||||||
const missing = accept ? "" : (missingMatch?.[1].trim() || clean.trim().slice(-500) || "judge gave no reason");
|
const missing = accept ? "" : (missingMatch?.[1].trim() || clean.trim().slice(-500) || "judge gave no reason");
|
||||||
return { accept, missing };
|
// The judge's own words (inspection + verdict), so CompleteGoal can show them. The verdict is at the
|
||||||
|
// end, so keep the tail when it's long.
|
||||||
|
const trimmed = clean.trim();
|
||||||
|
const reasoning = trimmed.length > 1800 ? `...\n${trimmed.slice(-1800)}` : trimmed;
|
||||||
|
return { accept, missing, reasoning };
|
||||||
}
|
}
|
||||||
|
|||||||
+130
-94
@@ -2,82 +2,104 @@
|
|||||||
* plan-file.ts — read goals.md, and the two writes CompleteGoal needs. That is all.
|
* plan-file.ts — read goals.md, and the two writes CompleteGoal needs. That is all.
|
||||||
*
|
*
|
||||||
* Pure module, no pi deps, so it unit-tests without a runtime. The file is the canonical store and
|
* Pure module, no pi deps, so it unit-tests without a runtime. The file is the canonical store and
|
||||||
* the agent edits it with its normal Edit tool (create goals, tick subtasks, append log), guided by
|
* the agent edits it with its normal Edit tool (create goals, tick subtasks, fill evidence), guided
|
||||||
* the format in prompts.ts and the reminder -- the form guides, it does not gate (spec D3). So this
|
* by the format in prompts.ts and the reminder -- the form guides, it does not gate. The only
|
||||||
* module does NOT render or create goals; the format's single source of truth is the planDrafting
|
* programmatic writers are setGoalStatus + appendLog, used by CompleteGoal to record an accepted
|
||||||
* prompt. The only programmatic writers are setGoalStatus + appendLog, used by CompleteGoal to
|
* sign-off; both touch one line so the diff stays readable.
|
||||||
* record an accepted sign-off; both touch one line so the git diff stays readable.
|
|
||||||
*
|
*
|
||||||
* A goal's state lives in a checkbox on its header (single source of truth, renders natively):
|
* Format (markdown, checkbox-first, made to be skim-reviewed by a human):
|
||||||
* [ ] open [/] active (in progress) [x] done [-] cancelled
|
|
||||||
* Only CompleteGoal writes [x]; the agent sets [/] when it starts a goal.
|
|
||||||
*
|
*
|
||||||
* Format:
|
* # <plan title>
|
||||||
*
|
*
|
||||||
* # Goals: <objective>
|
* <context: the user's ask, preferences, decisions>
|
||||||
*
|
*
|
||||||
* ## Goal: [ ] <subject>
|
* ## Goals
|
||||||
* <!-- id: <slug> -->
|
|
||||||
* done_when: <one falsifiable check>
|
|
||||||
* verify: <shell command, optional>
|
|
||||||
* - [ ] <subtask>
|
|
||||||
*
|
*
|
||||||
* failure_modes:
|
* 1. [ ] goal: <desc> <- state in the checkbox: [ ] open [/] active [x] done [-] cancelled
|
||||||
* - <pre-mortem item>
|
* - discriminator: <positive observation that the goal succeeded, that no failure below could fake>
|
||||||
* evidence:
|
* - subtle failure mode: <a way this looks done but isn't>
|
||||||
* - <proof the done_when is met; filled at completion, read by CompleteGoal>
|
* - verify: <optional shell command that exits 0 only when the discriminator passes>
|
||||||
|
* - tasks:
|
||||||
|
* 1. [x] <subtask> <- a subtask is any checkbox WITHOUT a "goal:" prefix
|
||||||
|
* 2. [/] <subtask>
|
||||||
|
* 3. [-] <subtask> <- [-] or ~~[ ]~~ both read as cancelled
|
||||||
|
* - evidence: <- empty at planning; filled at sign-off, read by CompleteGoal
|
||||||
|
* - > <artifact path / link / metric, plus a short read of it>
|
||||||
|
* 2. [ ] goal: <desc>
|
||||||
|
*
|
||||||
|
* # Future work / out of scope
|
||||||
*
|
*
|
||||||
* ## Log
|
* ## Log
|
||||||
* - <verbatim append-only line>
|
* - <verbatim append-only line>
|
||||||
|
*
|
||||||
|
* A goal/subtask's state lives in its checkbox (single source of truth, renders natively). Goals are
|
||||||
|
* matched by their <desc> (the text after "goal:"); the list number is human-facing only. Only
|
||||||
|
* CompleteGoal writes a goal's [x]; the agent sets [/] when it starts one.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
export type GoalStatus = "open" | "active" | "done" | "cancelled";
|
export type GoalStatus = "open" | "active" | "done" | "cancelled";
|
||||||
|
|
||||||
export interface Subtask {
|
export interface Subtask {
|
||||||
text: string;
|
text: string;
|
||||||
done: boolean;
|
status: GoalStatus;
|
||||||
}
|
}
|
||||||
|
|
||||||
export interface Goal {
|
export interface Goal {
|
||||||
id: string;
|
/** The text after "goal:" in the header line; the handle CompleteGoal matches on. */
|
||||||
subject: string;
|
subject: string;
|
||||||
status: GoalStatus;
|
status: GoalStatus;
|
||||||
done_when: string;
|
/** Positive observation(s) that the goal succeeded AND that no failure mode could fake. The success test. Written at planning. */
|
||||||
verify?: string;
|
discriminator: string[];
|
||||||
/** Pre-mortem: ways a "done" could be wrong. Written at planning. */
|
/** Subtle ways a "done" could be wrong (look-like-success failures). Written at planning. */
|
||||||
failure_modes: string[];
|
failure_modes: string[];
|
||||||
/** Proof the done_when is met, pointing at durable artifacts. Written at completion; read by CompleteGoal. */
|
/** Optional command that exits 0 only when the discriminator passes (the cheap deterministic gate). */
|
||||||
|
verify?: string;
|
||||||
|
/** Proof the discriminator passed, pointing at durable artifacts. Written at completion; read by CompleteGoal. */
|
||||||
evidence: string[];
|
evidence: string[];
|
||||||
subtasks: Subtask[];
|
subtasks: Subtask[];
|
||||||
}
|
}
|
||||||
|
|
||||||
export interface PlanDoc {
|
export interface PlanDoc {
|
||||||
objective: string;
|
title: string;
|
||||||
goals: Goal[];
|
goals: Goal[];
|
||||||
/** Verbatim ## Log lines, including the leading "- ". */
|
/** Verbatim ## Log lines, including the leading "- ". */
|
||||||
log: string[];
|
log: string[];
|
||||||
}
|
}
|
||||||
|
|
||||||
// Goal header carries the state checkbox: `## Goal: [x] subject`. The checkbox is optional so a
|
const TITLE = /^#\s+(.+?)\s*$/; // the first single-# H1
|
||||||
// header written without one parses as open (group 1 undefined -> " ").
|
const GOALS_HEADER = /^##\s+Goals\s*$/i;
|
||||||
const GOAL_HEADER = /^##\s+Goal:\s*(?:\[([ xX/-])\]\s+)?(.*)$/;
|
|
||||||
const ANY_HEADER = /^#{1,6}\s/;
|
|
||||||
const LOG_HEADER = /^##\s+Log\s*$/i;
|
const LOG_HEADER = /^##\s+Log\s*$/i;
|
||||||
const ID_COMMENT = /^<!--\s*id:\s*(.+?)\s*-->$/;
|
const ANY_HEADER = /^#{1,6}\s/;
|
||||||
const CHECKBOX = /^- \[([ xX])\]\s+(.*)$/;
|
// A goal: a numbered or bulleted checkbox item whose text begins "goal:".
|
||||||
|
const GOAL_ITEM = /^\s*(?:\d+\.|[-*])\s*\[([ xX/-])\]\s*goal:\s*(.*)$/i;
|
||||||
|
// A section marker bullet under a goal (the trailing colon is optional, e.g. "- tasks").
|
||||||
|
const KEY_LINE = /^\s*[-*]\s*(discriminator|subtle failure modes?|failure_modes?|verify|tasks?|evidence)\s*:?\s*(.*)$/i;
|
||||||
|
// Any list item (numbered or bulleted); used for subtasks and for list items inside the sections.
|
||||||
|
const LIST_ITEM = /^\s*(?:\d+\.|[-*])\s+(.*)$/;
|
||||||
|
// A checkbox inside a list-item body (subtask). A leading/trailing ~~ marks it cancelled.
|
||||||
|
const CHECKBOX_BODY = /^(~~)?\s*\[([ xX/-])\]\s*(.*)$/;
|
||||||
|
|
||||||
const CHAR_TO_STATUS: Record<string, GoalStatus> = { " ": "open", "/": "active", x: "done", "-": "cancelled" };
|
const CHAR_TO_STATUS: Record<string, GoalStatus> = { " ": "open", "/": "active", x: "done", "-": "cancelled" };
|
||||||
const STATUS_TO_CHAR: Record<GoalStatus, string> = { open: " ", active: "/", done: "x", cancelled: "-" };
|
const STATUS_TO_CHAR: Record<GoalStatus, string> = { open: " ", active: "/", done: "x", cancelled: "-" };
|
||||||
|
|
||||||
|
function normalizeKey(raw: string): "discriminator" | "failure_modes" | "verify" | "tasks" | "evidence" {
|
||||||
|
const k = raw.toLowerCase();
|
||||||
|
if (k.startsWith("discriminator")) return "discriminator";
|
||||||
|
if (k.startsWith("verify")) return "verify";
|
||||||
|
if (k.startsWith("task")) return "tasks";
|
||||||
|
if (k.startsWith("evidence")) return "evidence";
|
||||||
|
return "failure_modes"; // "subtle failure mode(s)" / "failure_mode(s)"
|
||||||
|
}
|
||||||
|
|
||||||
export function parse(text: string): PlanDoc {
|
export function parse(text: string): PlanDoc {
|
||||||
const lines = text.split("\n");
|
const lines = text.split("\n");
|
||||||
let objective = "";
|
let title = "";
|
||||||
const goals: Goal[] = [];
|
const goals: Goal[] = [];
|
||||||
const log: string[] = [];
|
const log: string[] = [];
|
||||||
|
|
||||||
let cur: Goal | null = null;
|
let cur: Goal | null = null;
|
||||||
// While inside a `failure_modes:`/`evidence:` block, points at the list the "- " items append to.
|
let curList: string[] | null = null; // the discriminator/failure_modes/evidence list "- " items append to
|
||||||
let curList: string[] | null = null;
|
let inGoals = false;
|
||||||
let inLog = false;
|
let inLog = false;
|
||||||
|
|
||||||
const flush = () => {
|
const flush = () => {
|
||||||
@@ -87,30 +109,27 @@ export function parse(text: string): PlanDoc {
|
|||||||
};
|
};
|
||||||
|
|
||||||
for (const line of lines) {
|
for (const line of lines) {
|
||||||
const objMatch = /^#\s+Goals:\s*(.*)$/.exec(line);
|
const tM = TITLE.exec(line);
|
||||||
if (objMatch) {
|
if (tM && !title && !GOALS_HEADER.test(line) && !LOG_HEADER.test(line)) {
|
||||||
objective = objMatch[1].trim();
|
title = tM[1].trim();
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
if (GOALS_HEADER.test(line)) {
|
||||||
const goalMatch = GOAL_HEADER.exec(line);
|
|
||||||
if (goalMatch) {
|
|
||||||
flush();
|
flush();
|
||||||
|
inGoals = true;
|
||||||
inLog = false;
|
inLog = false;
|
||||||
const status = CHAR_TO_STATUS[(goalMatch[1] ?? " ").toLowerCase()] ?? "open";
|
|
||||||
cur = { id: "", subject: goalMatch[2].trim(), status, done_when: "", failure_modes: [], evidence: [], subtasks: [] };
|
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (LOG_HEADER.test(line)) {
|
if (LOG_HEADER.test(line)) {
|
||||||
flush();
|
flush();
|
||||||
|
inGoals = false;
|
||||||
inLog = true;
|
inLog = true;
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
// Any other header (e.g. "# Future work") ends the goals / log section.
|
||||||
// Any other header ends the current goal / log section.
|
|
||||||
if (ANY_HEADER.test(line)) {
|
if (ANY_HEADER.test(line)) {
|
||||||
flush();
|
flush();
|
||||||
|
inGoals = false;
|
||||||
inLog = false;
|
inLog = false;
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
@@ -119,50 +138,70 @@ export function parse(text: string): PlanDoc {
|
|||||||
if (/^\s*-\s+/.test(line)) log.push(line);
|
if (/^\s*-\s+/.test(line)) log.push(line);
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
if (!inGoals) continue; // title + context prose between the title and ## Goals
|
||||||
|
|
||||||
|
const goalM = GOAL_ITEM.exec(line);
|
||||||
|
if (goalM) {
|
||||||
|
flush();
|
||||||
|
cur = {
|
||||||
|
subject: goalM[2].trim(),
|
||||||
|
status: CHAR_TO_STATUS[goalM[1].toLowerCase()] ?? "open",
|
||||||
|
discriminator: [],
|
||||||
|
failure_modes: [],
|
||||||
|
evidence: [],
|
||||||
|
subtasks: [],
|
||||||
|
};
|
||||||
|
continue;
|
||||||
|
}
|
||||||
if (!cur) continue;
|
if (!cur) continue;
|
||||||
|
|
||||||
const idMatch = ID_COMMENT.exec(line.trim());
|
const keyM = KEY_LINE.exec(line);
|
||||||
if (idMatch) {
|
if (keyM) {
|
||||||
cur.id = idMatch[1];
|
const key = normalizeKey(keyM[1]);
|
||||||
continue;
|
const inlineVal = keyM[2].trim();
|
||||||
}
|
if (key === "verify") {
|
||||||
|
cur.verify = inlineVal || undefined;
|
||||||
// A checkbox (column 0) is a subtask; checked first so it is never read as a list item.
|
|
||||||
const checkbox = CHECKBOX.exec(line);
|
|
||||||
if (checkbox) {
|
|
||||||
curList = null;
|
curList = null;
|
||||||
cur.subtasks.push({ done: checkbox[1].toLowerCase() === "x", text: checkbox[2].trim() });
|
} else if (key === "tasks") {
|
||||||
|
curList = null; // subtasks are identified by being a checkbox; this marker is cosmetic
|
||||||
|
} else {
|
||||||
|
curList = cur[key]; // discriminator | failure_modes | evidence
|
||||||
|
if (inlineVal) curList.push(inlineVal);
|
||||||
|
}
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
const kv = /^(done_when|verify|failure_modes|evidence)\s*:\s*(.*)$/.exec(line);
|
const listM = LIST_ITEM.exec(line);
|
||||||
if (kv) {
|
if (listM) {
|
||||||
const [, key, value] = kv;
|
const body = listM[1];
|
||||||
if (key === "done_when") cur.done_when = value.trim();
|
const cb = CHECKBOX_BODY.exec(body);
|
||||||
else if (key === "verify") cur.verify = value.trim() || undefined;
|
if (cb) {
|
||||||
// failure_modes/evidence open a "- " block; done_when/verify close any open one.
|
// A checkbox without a "goal:" prefix is a subtask of the current goal.
|
||||||
curList = key === "failure_modes" ? cur.failure_modes : key === "evidence" ? cur.evidence : null;
|
const cancelled = cb[1] === "~~" || body.includes("~~");
|
||||||
|
const status = cancelled ? "cancelled" : (CHAR_TO_STATUS[cb[2].toLowerCase()] ?? "open");
|
||||||
|
cur.subtasks.push({ text: cb[3].replace(/~~/g, "").trim(), status });
|
||||||
|
curList = null;
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
// A plain "- " / "> " item belongs to the current section (discriminator/failure/evidence).
|
||||||
|
if (curList) curList.push(body.trim());
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Indented "- " items under failure_modes:/evidence: (a column-0 checkbox already returned above).
|
// A non-empty, non-"- " line continues the current item, so multi-line evidence (a block quote
|
||||||
if (curList) {
|
// of a log, a table, an interpretation line) stays attached to its item. Blank lines are skipped.
|
||||||
const item = /^\s*-\s+(.*)$/.exec(line);
|
if (curList && line.trim() !== "" && curList.length > 0) {
|
||||||
if (item) {
|
curList[curList.length - 1] += `\n${line.trim()}`;
|
||||||
curList.push(item[1].trim());
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
if (line.trim() !== "") curList = null;
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
flush();
|
flush();
|
||||||
|
|
||||||
return { objective, goals, log };
|
return { title, goals, log };
|
||||||
}
|
}
|
||||||
|
|
||||||
export function findGoal(doc: PlanDoc, id: string): Goal | undefined {
|
export function findGoal(doc: PlanDoc, subject: string): Goal | undefined {
|
||||||
return doc.goals.find((g) => g.id === id);
|
const want = subject.trim();
|
||||||
|
return doc.goals.find((g) => g.subject === want);
|
||||||
}
|
}
|
||||||
|
|
||||||
export function counts(doc: PlanDoc): { done: number; open: number; active: number } {
|
export function counts(doc: PlanDoc): { done: number; open: number; active: number } {
|
||||||
@@ -175,21 +214,18 @@ export function counts(doc: PlanDoc): { done: number; open: number; active: numb
|
|||||||
return c;
|
return c;
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Flip a goal's header checkbox in place (the one write CompleteGoal needs). Normalizes a header that
|
/** Flip a goal's checkbox in place, matched by its subject (the one write CompleteGoal needs). */
|
||||||
* lacks a checkbox by inserting one. */
|
export function setGoalStatus(text: string, subject: string, status: GoalStatus): string {
|
||||||
export function setGoalStatus(text: string, id: string, status: GoalStatus): string {
|
|
||||||
const lines = text.split("\n");
|
const lines = text.split("\n");
|
||||||
const idIdx = lines.findIndex((l) => ID_COMMENT.exec(l.trim())?.[1] === id);
|
const want = subject.trim();
|
||||||
if (idIdx === -1) throw new Error(`Goal #${id} not found`);
|
for (let i = 0; i < lines.length; i++) {
|
||||||
// The header sits just above the id comment; scan upward for it.
|
const m = GOAL_ITEM.exec(lines[i]);
|
||||||
for (let i = idIdx; i >= 0; i--) {
|
if (m && m[2].trim() === want) {
|
||||||
const m = GOAL_HEADER.exec(lines[i]);
|
lines[i] = lines[i].replace(/\[[ xX/-]\]/, `[${STATUS_TO_CHAR[status]}]`);
|
||||||
if (m) {
|
|
||||||
lines[i] = `## Goal: [${STATUS_TO_CHAR[status]}] ${m[2].trim()}`;
|
|
||||||
return lines.join("\n");
|
return lines.join("\n");
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
throw new Error(`Goal #${id} has no ## Goal: header`);
|
throw new Error(`Goal "${subject}" not found`);
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
@@ -201,28 +237,28 @@ export type SignOff =
|
|||||||
| { kind: "rejected"; missing: string }
|
| { kind: "rejected"; missing: string }
|
||||||
| { kind: "accepted" };
|
| { kind: "accepted" };
|
||||||
|
|
||||||
/** Apply a sign-off outcome to goals.md text: accept flips the header checkbox to [x] + logs; reject only logs. Pure. */
|
/** Apply a sign-off outcome to goals.md text: accept flips the goal checkbox to [x] + logs; reject only logs. Pure. */
|
||||||
export function recordSignOff(
|
export function recordSignOff(
|
||||||
text: string,
|
text: string,
|
||||||
goalId: string,
|
subject: string,
|
||||||
when: string,
|
when: string,
|
||||||
outcome: SignOff,
|
outcome: SignOff,
|
||||||
): { content: string; message: string; isError: boolean } {
|
): { content: string; message: string; isError: boolean } {
|
||||||
const goal = findGoal(parse(text), goalId);
|
const goal = findGoal(parse(text), subject);
|
||||||
if (!goal) return { content: text, message: `No goal #${goalId} in goals.md.`, isError: true };
|
if (!goal) return { content: text, message: `No goal "${subject}" in goals.md.`, isError: true };
|
||||||
|
|
||||||
if (outcome.kind === "verify_failed") {
|
if (outcome.kind === "verify_failed") {
|
||||||
const content = appendLog(text, `${when} reject #${goalId}: verify exit ${outcome.exitCode}`);
|
const content = appendLog(text, `${when} reject "${subject}": verify exit ${outcome.exitCode}`);
|
||||||
return { content, message: `Sign-off rejected: verify failed (exit ${outcome.exitCode}).\n${outcome.outputTail}`, isError: true };
|
return { content, message: `Sign-off rejected: verify failed (exit ${outcome.exitCode}).\n${outcome.outputTail}`, isError: true };
|
||||||
}
|
}
|
||||||
if (outcome.kind === "rejected") {
|
if (outcome.kind === "rejected") {
|
||||||
const oneLine = outcome.missing.replace(/\s+/g, " ").trim().slice(0, 200);
|
const oneLine = outcome.missing.replace(/\s+/g, " ").trim().slice(0, 200);
|
||||||
const content = appendLog(text, `${when} reject #${goalId}: ${oneLine}`);
|
const content = appendLog(text, `${when} reject "${subject}": ${oneLine}`);
|
||||||
return { content, message: `Sign-off rejected. Missing:\n${outcome.missing}`, isError: true };
|
return { content, message: `Sign-off rejected. Missing:\n${outcome.missing}`, isError: true };
|
||||||
}
|
}
|
||||||
const flipped = setGoalStatus(text, goalId, "done");
|
const flipped = setGoalStatus(text, subject, "done");
|
||||||
const content = appendLog(flipped, `${when} signed off #${goalId}: ${goal.subject} (oracle accept)`);
|
const content = appendLog(flipped, `${when} signed off "${subject}" (judge accept)`);
|
||||||
return { content, message: `Signed off #${goalId}: ${goal.subject}. Marked done in goals.md.`, isError: false };
|
return { content, message: `Signed off "${subject}". Marked done in goals.md.`, isError: false };
|
||||||
}
|
}
|
||||||
|
|
||||||
/** Append one verbatim line to ## Log (creating the section if absent). The other CompleteGoal write. */
|
/** Append one verbatim line to ## Log (creating the section if absent). The other CompleteGoal write. */
|
||||||
|
|||||||
+107
-80
@@ -8,7 +8,7 @@
|
|||||||
* trapping it. Bypasses stay visible in the git diff and the widget.
|
* trapping it. Bypasses stay visible in the git diff and the widget.
|
||||||
*
|
*
|
||||||
* Flow:
|
* Flow:
|
||||||
* SETUP (plan mode) 1. planDrafting — strong/sticky model drafts goals
|
* SETUP (plan mode) 1. planDrafting — drafts goals (read-only phase)
|
||||||
* EXEC, each turn start 2. planInjection — "here is your plan, where you are"
|
* EXEC, each turn start 2. planInjection — "here is your plan, where you are"
|
||||||
* EXEC, periodic 3. reminder — the typed nudge that drives upkeep + autonomy
|
* EXEC, periodic 3. reminder — the typed nudge that drives upkeep + autonomy
|
||||||
* EXEC, loop continue 4. continuation — keep going toward the active goal
|
* EXEC, loop continue 4. continuation — keep going toward the active goal
|
||||||
@@ -22,61 +22,82 @@
|
|||||||
* NOT YET WIRED: 4 continuation and 5 loopJudge define the autonomous re-prompt loop, which is
|
* NOT YET WIRED: 4 continuation and 5 loopJudge define the autonomous re-prompt loop, which is
|
||||||
* intentionally not built in v1 (an until-done-style loop was judged too complex). They stay here so
|
* intentionally not built in v1 (an until-done-style loop was judged too complex). They stay here so
|
||||||
* the full intended flow is reviewable; wire them if/when the loop is added.
|
* the full intended flow is reviewable; wire them if/when the loop is added.
|
||||||
|
*
|
||||||
|
* The goal's test is the DISCRIMINATOR: the concrete observation that tells real success from the
|
||||||
|
* named subtle failure mode. It replaces a vague "done_when". Evidence is empty at planning and
|
||||||
|
* filled at sign-off (you don't always know the exact artifacts up front; the judge checks them then).
|
||||||
*/
|
*/
|
||||||
|
|
||||||
/* ─────────────────────────────────────────────────────────────────────────
|
/* ─────────────────────────────────────────────────────────────────────────
|
||||||
* 1. planDrafting — SETUP, plan mode
|
* 1. planDrafting — SETUP, plan mode
|
||||||
*
|
*
|
||||||
* System guidance for the plan-phase agent. Runs on the plan model (may differ
|
* System guidance for the plan-phase agent. This phase is read-only (edit/write
|
||||||
* from the execution model; the choice is sticky — see oracle.json-style config).
|
* and mutating bash are blocked by a tool hook): explore, then draft goals into
|
||||||
* This phase is read-only: explore, then draft goals into goals.md. No code yet.
|
* goals.md. The fields here are the whole "elicitation"; the human reviews this
|
||||||
* The field requirements here are the whole "elicitation" — get them agreed up
|
* output before any execution.
|
||||||
* front, because the human reviews this output before any execution.
|
|
||||||
* ──────────────────────────────────────────────────────────────────────── */
|
* ──────────────────────────────────────────────────────────────────────── */
|
||||||
export const planDrafting = `\
|
export const planDrafting = `\
|
||||||
You are in plan mode. Explore the repository read-only, then draft goals into goals.md.
|
You are in plan mode. The objective may arrive through conversation, not as one up-front command.
|
||||||
Do not write or run code in this phase. Produce a plan the human will review and approve.
|
Explore the repository read-only first, then ask: resolve discoverable facts by looking them up, and
|
||||||
|
only ask the human when the answer is a genuine intent or preference choice that exploration can't
|
||||||
|
settle. Don't write goals that branch on something you could just check. Do not write or run code in
|
||||||
|
this phase (edit and write are blocked, and so is mutating bash). If the ask is itself read-only
|
||||||
|
(e.g. research, a search, a report), explore enough to scope it, but leave the actual deliverable for
|
||||||
|
after the human approves the plan. When the objective is clear, draft goals into goals.md and stop
|
||||||
|
for review. Produce a plan the human will review and approve.
|
||||||
|
|
||||||
Right-size it, don't force structure that isn't there:
|
Right-size it, don't force structure that isn't there:
|
||||||
- Default to ONE goal. Add another only when it's a genuinely separate checkpoint you'd want
|
- Default to ONE goal. Add another only when it's a genuinely separate checkpoint you'd want signed
|
||||||
signed off on its own (its own done_when that can pass or fail independently). A long list of
|
off on its own (it can pass or fail independently). Most objectives are 1-2 goals.
|
||||||
near-identical goals should be one goal with subtasks. Most objectives are 1-2 goals.
|
- Subtasks are the steps inside a goal. Add them when a goal has 3+ distinct steps; skip them for a
|
||||||
- Subtasks are the steps inside a goal. Add them when a goal has 3+ distinct steps; skip them for
|
single-action goal. Don't pad with trivial steps.
|
||||||
a single-action goal. Don't pad with trivial steps.
|
- Don't invent goals to look thorough. When in doubt, merge.
|
||||||
- Don't invent phases to look thorough. When in doubt, merge.
|
|
||||||
|
|
||||||
Write the whole file in this shape:
|
Write the whole file in this shape (markdown checkboxes, made to be skim-reviewed):
|
||||||
|
|
||||||
# Goals: <the objective>
|
# <short plan title>
|
||||||
|
|
||||||
## Goal: [ ] <one short imperative line>
|
<context: restate the user's ask, their stated preferences, and any decisions you've agreed on>
|
||||||
<!-- id: <kebab-case-slug, unique> -->
|
|
||||||
done_when: <one falsifiable check; what is true on disk when this is done>
|
|
||||||
verify: <optional shell command that exits 0 only when done_when holds; omit if not testable>
|
|
||||||
- [ ] <subtask>
|
|
||||||
- [ ] <subtask>
|
|
||||||
|
|
||||||
failure_modes:
|
## Goals
|
||||||
- <a sneaky way this could look done but isn't; terse, optional>
|
|
||||||
evidence:
|
|
||||||
- <leave empty now; fill at sign-off with proof the done_when is met (durable artifacts)>
|
|
||||||
|
|
||||||
Keep it lean:
|
1. [ ] goal: <one short imperative line>
|
||||||
- The goal's state is the checkbox in its header: [ ] open, [/] active, [x] done, [-] cancelled.
|
- subtle failure mode: <a way this could look done but isn't>
|
||||||
Leave it [ ] at planning. Every goal needs its <!-- id --> line; CompleteGoal finds goals by it.
|
- discriminator: <the concrete observation that tells real success from that failure>
|
||||||
- The subtask checklist comes right under the goal; failure_modes and the (empty) evidence block
|
- verify: <optional shell command that exits 0 only when the discriminator passes; omit if not testable>
|
||||||
sit at the end, after a blank line. Don't let the dash-lists run together.
|
- tasks:
|
||||||
- evidence stays empty at planning. You fill it when the goal is actually done, just before calling
|
1. [ ] <subtask>
|
||||||
CompleteGoal, with a "- " list pointing at real artifacts (files, saved logs, committed diffs).
|
2. [ ] <subtask>
|
||||||
- done_when is ONE concrete, checkable condition, not a paragraph, no "if wrong" clause.
|
- evidence:
|
||||||
The symptom of failure goes in failure_modes, not here.
|
- <leave empty now; filled at sign-off>
|
||||||
- done_when names a real artifact: a file, a test result, a committed diff, a program's output.
|
2. [ ] goal: <...>
|
||||||
Never write it about goals.md's own checkbox or ## Log: CompleteGoal writes those when it accepts,
|
|
||||||
so a done_when about them is circular and the sign-off can never pass.
|
# Future work / out of scope
|
||||||
- failure_modes: 0-2 terse items, only the non-obvious ways a "done" could be wrong (a
|
|
||||||
pre-mortem). If you add a verify command, one mode can be "verify passes on a gamed file".
|
- <anything deliberately not in these goals>
|
||||||
- subtasks: a short checklist of the real steps; omit them if the goal is a single action.
|
|
||||||
- Prefer a verify command when success is a test/build/threshold. A green check beats prose.
|
## Log
|
||||||
|
|
||||||
|
Keep it lean and legible:
|
||||||
|
- A goal is a checkbox line beginning "goal:"; its state is the checkbox ([ ] open, [/] active, [x]
|
||||||
|
done, [-] cancelled). Leave goals [ ] at planning. The number is just for the human to reference.
|
||||||
|
- subtle failure mode + discriminator are the heart of this. List the ways a "done" could look
|
||||||
|
achieved but not be (empty/zero-count output, a silently-errored step, a gamed test, a flat/no-op
|
||||||
|
result that dodged every trap and still showed nothing; these are examples, find the ones that fit).
|
||||||
|
- The discriminator is the POSITIVE observation that the goal actually succeeded AND that none of
|
||||||
|
those failure modes could have produced. It must show success happened -- the count moved the right
|
||||||
|
way, the test really exercised the path, the metric beat noise -- not merely that a failure was
|
||||||
|
ruled out: avoiding every failure mode is necessary, not sufficient. Name the success signal first,
|
||||||
|
then check it isn't something a failure mode could fake. Keep it terse.
|
||||||
|
- The discriminator is the success test, written now, in place of a vague "done": make it a concrete,
|
||||||
|
checkable observation about a real artifact (a file, a test result, a committed diff, a metric), not
|
||||||
|
about goals.md's own checkbox.
|
||||||
|
- subtasks: any checkbox WITHOUT a "goal:" prefix, under "- tasks:". Use [/] for in progress and [-]
|
||||||
|
for cancelled/impossible.
|
||||||
|
- verify: prefer one when the discriminator is a test, build, threshold, or metric: a green check or
|
||||||
|
a printed number beats prose. Omit it otherwise.
|
||||||
|
- evidence stays empty at planning. You don't always know the exact artifacts up front, and that's
|
||||||
|
fine: you fill evidence at sign-off, and a fresh read-only judge checks it then.
|
||||||
|
|
||||||
When the goals are drafted, present them and stop for review. Do not begin execution.`;
|
When the goals are drafted, present them and stop for review. Do not begin execution.`;
|
||||||
|
|
||||||
@@ -85,25 +106,26 @@ When the goals are drafted, present them and stop for review. Do not begin execu
|
|||||||
*
|
*
|
||||||
* A late user-role message, NOT a system-prompt mutation (keeps the prefix cache
|
* A late user-role message, NOT a system-prompt mutation (keeps the prefix cache
|
||||||
* valid). Built from the parsed plan. MUST be byte-identical when nothing changed:
|
* valid). Built from the parsed plan. MUST be byte-identical when nothing changed:
|
||||||
* fixed field order, no volatile timestamps in the body. Pass only the active
|
* fixed field order, no volatile timestamps. Pass only the active goal + its open
|
||||||
* goal + its open subtasks + the last log line — not the whole file.
|
* subtasks + the last log line, not the whole file.
|
||||||
* ──────────────────────────────────────────────────────────────────────── */
|
* ──────────────────────────────────────────────────────────────────────── */
|
||||||
export function planInjection(p: {
|
export function planInjection(p: {
|
||||||
objective: string;
|
title: string;
|
||||||
activeGoal: { subject: string; done_when: string; openSubtasks: string[] } | null;
|
activeGoal: { subject: string; discriminator: string[]; openSubtasks: string[] } | null;
|
||||||
lastLogLine: string | null;
|
lastLogLine: string | null;
|
||||||
counts: { done: number; open: number };
|
counts: { done: number; open: number };
|
||||||
}): string {
|
}): string {
|
||||||
if (!p.activeGoal) {
|
if (!p.activeGoal) {
|
||||||
return `Goals (goals.md): ${p.objective}\nNo active goal. ${p.counts.open} open, ${p.counts.done} done. Pick the next goal (set its header to [/]) or run /goals.`;
|
return `Goals (goals.md): ${p.title}\nNo active goal. ${p.counts.open} open, ${p.counts.done} done. Pick the next goal (set its checkbox to [/]) or run /goals.`;
|
||||||
}
|
}
|
||||||
const subtasks = p.activeGoal.openSubtasks.length
|
const subtasks = p.activeGoal.openSubtasks.length
|
||||||
? p.activeGoal.openSubtasks.map((s) => ` - [ ] ${s}`).join("\n")
|
? p.activeGoal.openSubtasks.map((s) => ` - [ ] ${s}`).join("\n")
|
||||||
: " (no open subtasks)";
|
: " (no open subtasks)";
|
||||||
|
const disc = p.activeGoal.discriminator.length ? p.activeGoal.discriminator.join("; ") : "(none set)";
|
||||||
return `\
|
return `\
|
||||||
Goals (goals.md): ${p.objective}
|
Goals (goals.md): ${p.title}
|
||||||
Active goal: ${p.activeGoal.subject}
|
Active goal: ${p.activeGoal.subject}
|
||||||
done_when: ${p.activeGoal.done_when}
|
discriminator (the success test): ${disc}
|
||||||
Open subtasks:
|
Open subtasks:
|
||||||
${subtasks}
|
${subtasks}
|
||||||
Last log: ${p.lastLogLine ?? "(none yet)"}
|
Last log: ${p.lastLogLine ?? "(none yet)"}
|
||||||
@@ -114,20 +136,20 @@ Progress: ${p.counts.done} done, ${p.counts.open} open.`;
|
|||||||
* 3. reminder — EXEC, periodic system-reminder
|
* 3. reminder — EXEC, periodic system-reminder
|
||||||
*
|
*
|
||||||
* The typed nudge. This is both the housekeeping and the autonomy engine — it is
|
* The typed nudge. This is both the housekeeping and the autonomy engine — it is
|
||||||
* what makes the process get followed without a hard gate. Fires after N
|
* what makes the process get followed without a hard gate. Fires after a turn that
|
||||||
* file-modifying turns since the last goals.md update while a goal is active.
|
* left goals.md untouched while a goal is active. Keep the wording stable so it
|
||||||
* Keep the wording stable so it doesn't thrash the cache.
|
* doesn't thrash the cache.
|
||||||
* ──────────────────────────────────────────────────────────────────────── */
|
* ──────────────────────────────────────────────────────────────────────── */
|
||||||
export const reminder = `\
|
export const reminder = `\
|
||||||
<system-reminder>
|
<system-reminder>
|
||||||
Keep goals.md current as you work:
|
Keep goals.md current as you work:
|
||||||
- tasks: tick the subtasks you've finished; add any new ones you've discovered.
|
- tasks: tick the subtasks you've finished ([/] for in progress); add any you've discovered.
|
||||||
- log: append ONE short line to ## Log (append, don't rewrite earlier lines).
|
- log: append ONE short line to ## Log (append, don't rewrite earlier lines).
|
||||||
- goal: when the active goal's done_when is met, fill its evidence: block in goals.md (a "- " list
|
- goal: when the active goal's discriminator is satisfied, fill its evidence: block in goals.md (a
|
||||||
pointing at durable artifacts), then call CompleteGoal with the goal_id. Don't tick the goal's
|
list pointing at durable artifacts), then call CompleteGoal with the goal's desc. Don't tick the
|
||||||
header [x] by hand; CompleteGoal reads the evidence, runs the check, and writes [x].
|
goal [x] by hand; CompleteGoal reads the evidence, runs the check, and writes [x].
|
||||||
- otherwise: keep working toward the active goal. Don't stop to ask unless you're genuinely
|
- otherwise: keep working toward the active goal. Don't stop to ask unless you're genuinely blocked;
|
||||||
blocked; if blocked, say what's blocking and why.
|
if blocked, say what's blocking it.
|
||||||
</system-reminder>`;
|
</system-reminder>`;
|
||||||
|
|
||||||
/* ─────────────────────────────────────────────────────────────────────────
|
/* ─────────────────────────────────────────────────────────────────────────
|
||||||
@@ -137,9 +159,9 @@ Keep goals.md current as you work:
|
|||||||
* continue. Does not mutate the system prompt, so the cache holds.
|
* continue. Does not mutate the system prompt, so the cache holds.
|
||||||
* ──────────────────────────────────────────────────────────────────────── */
|
* ──────────────────────────────────────────────────────────────────────── */
|
||||||
export const continuation = `\
|
export const continuation = `\
|
||||||
Continue toward the active goal in goals.md. If it now meets its done_when, fill the goal's
|
Continue toward the active goal in goals.md. If its discriminator is now satisfied, fill the goal's
|
||||||
evidence: block (durable artifacts: saved logs, committed diffs, files, not just claims) and then
|
evidence: block (durable artifacts, e.g. saved logs, committed diffs, files, not just claims) and
|
||||||
call CompleteGoal with the goal_id. If you're blocked, state what's blocking it.`;
|
then call CompleteGoal with the goal's desc. If you're blocked, state what's blocking it.`;
|
||||||
|
|
||||||
/* ─────────────────────────────────────────────────────────────────────────
|
/* ─────────────────────────────────────────────────────────────────────────
|
||||||
* 5. loopJudge — EXEC, runs after each turn to decide continue / pause
|
* 5. loopJudge — EXEC, runs after each turn to decide continue / pause
|
||||||
@@ -154,12 +176,12 @@ You decide whether an autonomous coding agent should keep working or pause for t
|
|||||||
Be conservative: only pause when the work is plainly finished or plainly blocked. When in
|
Be conservative: only pause when the work is plainly finished or plainly blocked. When in
|
||||||
doubt, continue. You are not verifying correctness; a later read-only judge does that.
|
doubt, continue. You are not verifying correctness; a later read-only judge does that.
|
||||||
Reply with ONLY a JSON object, no other text: {"done": boolean, "reason": "<one sentence>"}.
|
Reply with ONLY a JSON object, no other text: {"done": boolean, "reason": "<one sentence>"}.
|
||||||
Set done=true only if the agent's last message shows the active goal's done_when is met, or
|
Set done=true only if the agent's last message shows the active goal's discriminator is satisfied,
|
||||||
the agent says it is blocked and needs the human.`;
|
or the agent says it is blocked and needs the human.`;
|
||||||
|
|
||||||
export function loopJudgeUser(p: { activeGoalDoneWhen: string; lastResponse: string }): string {
|
export function loopJudgeUser(p: { discriminator: string; lastResponse: string }): string {
|
||||||
return `\
|
return `\
|
||||||
Active goal done_when: ${p.activeGoalDoneWhen}
|
Active goal discriminator (the success test): ${p.discriminator}
|
||||||
|
|
||||||
Agent's last message:
|
Agent's last message:
|
||||||
"""
|
"""
|
||||||
@@ -172,22 +194,26 @@ ${p.lastResponse}
|
|||||||
/* ─────────────────────────────────────────────────────────────────────────
|
/* ─────────────────────────────────────────────────────────────────────────
|
||||||
* 6. evidenceJudge — SIGN-OFF, the one rigorous check
|
* 6. evidenceJudge — SIGN-OFF, the one rigorous check
|
||||||
*
|
*
|
||||||
* Runs inside CompleteGoal, on the read-only oracle subprocess (fresh context,
|
* Runs inside CompleteGoal, on a read-only pi subprocess (fresh context via
|
||||||
* strongest reasoning on the chosen provider; override to a different vendor for
|
* --no-session, so it never sees the working agent's transcript; override to a
|
||||||
* high-stakes goals). It re-derives from the repo rather than trusting the
|
* different vendor for an independent cross-family check). It re-derives from the
|
||||||
* agent's transcription, and it judges whether a verify command actually tests
|
* repo rather than trusting the agent's transcription, and judges whether the
|
||||||
* the criterion or could pass while a named failure mode holds (gaming).
|
* evidence satisfies the discriminator and rules out the named failure mode.
|
||||||
*
|
*
|
||||||
* The transport gives it read/grep/find/ls. The prompt below imposes the verdict
|
* The transport gives it read/grep/find/ls. The prompt below imposes the verdict
|
||||||
* contract — the oracle returns prose by default, so parse the VERDICT line.
|
* contract — the subprocess returns prose by default, so parse the VERDICT line.
|
||||||
* ──────────────────────────────────────────────────────────────────────── */
|
* ──────────────────────────────────────────────────────────────────────── */
|
||||||
export const evidenceJudgeSystem = `\
|
export const evidenceJudgeSystem = `\
|
||||||
You are a read-only reviewer signing off a coding goal. Do not trust claims; verify.
|
You are a read-only reviewer signing off a coding goal. Do not trust claims; verify.
|
||||||
Use read/grep/find/ls to inspect the repository and the cited artifacts yourself. Re-read the
|
Use read/grep/find/ls to inspect the repository and the cited artifacts yourself. Re-read the
|
||||||
files, logs, and diffs the evidence points to; if something it asserts isn't on disk, you can't
|
files, logs, and diffs the evidence points to; if something it asserts isn't on disk, you can't
|
||||||
confirm it. If a verify command was run, judge whether it genuinely tests the criterion or
|
confirm it. Judge whether the evidence shows the goal POSITIVELY succeeded -- the discriminator's
|
||||||
could pass while one of the listed failure modes still holds; a tautological or skipped test
|
success signal is actually present, not just that the failure modes were dodged. Avoiding every
|
||||||
is a reject. Check each failure mode is actually ruled out, not just unmentioned.
|
failure mode is necessary but not sufficient: a run can rule out each trap and still have produced
|
||||||
|
nothing, so reject "no problems found" that lacks the positive result. Then check the named subtle
|
||||||
|
failure modes are genuinely ruled out, not just unmentioned. If a verify command was run,
|
||||||
|
judge whether it really tests the discriminator or could pass while the failure mode still holds; a
|
||||||
|
tautological or skipped test is a reject.
|
||||||
|
|
||||||
Finish with exactly these two lines and nothing after:
|
Finish with exactly these two lines and nothing after:
|
||||||
VERDICT: accept | reject
|
VERDICT: accept | reject
|
||||||
@@ -195,10 +221,10 @@ missing: <empty if accept; otherwise a short list of what's needed before this c
|
|||||||
|
|
||||||
export function evidenceJudgeUser(p: {
|
export function evidenceJudgeUser(p: {
|
||||||
subject: string;
|
subject: string;
|
||||||
done_when: string;
|
discriminator: string[];
|
||||||
|
failure_modes: string[];
|
||||||
verify: string | null;
|
verify: string | null;
|
||||||
verifyResult: { command: string; exitCode: number; outputTail: string } | null;
|
verifyResult: { command: string; exitCode: number; outputTail: string } | null;
|
||||||
failure_modes: string[];
|
|
||||||
evidence: string;
|
evidence: string;
|
||||||
paths: string[];
|
paths: string[];
|
||||||
}): string {
|
}): string {
|
||||||
@@ -207,9 +233,10 @@ export function evidenceJudgeUser(p: {
|
|||||||
: "verify command: none (no deterministic check for this goal)";
|
: "verify command: none (no deterministic check for this goal)";
|
||||||
return `\
|
return `\
|
||||||
Goal: ${p.subject}
|
Goal: ${p.subject}
|
||||||
done_when: ${p.done_when}
|
discriminator (must be satisfied):
|
||||||
failure_modes:
|
${p.discriminator.map((d) => ` - ${d}`).join("\n") || " (none stated, note this)"}
|
||||||
${p.failure_modes.map((f) => ` - ${f}`).join("\n")}
|
subtle failure modes (must be ruled out):
|
||||||
|
${p.failure_modes.map((f) => ` - ${f}`).join("\n") || " (none stated)"}
|
||||||
|
|
||||||
${verifyBlock}
|
${verifyBlock}
|
||||||
|
|
||||||
@@ -219,5 +246,5 @@ ${p.evidence}
|
|||||||
Artifacts it points to (inspect these):
|
Artifacts it points to (inspect these):
|
||||||
${p.paths.map((x) => ` - ${x}`).join("\n") || " (none listed, note this)"}
|
${p.paths.map((x) => ` - ${x}`).join("\n") || " (none listed, note this)"}
|
||||||
|
|
||||||
Verify the goal against its done_when. Then give your VERDICT.`;
|
Verify the evidence satisfies the discriminator and rules out the failure modes. Then give your VERDICT.`;
|
||||||
}
|
}
|
||||||
|
|||||||
+75
-90
@@ -1,27 +1,30 @@
|
|||||||
import { describe, expect, it } from "vitest";
|
import { describe, expect, it } from "vitest";
|
||||||
import { appendLog, counts, findGoal, parse, recordSignOff, setGoalStatus } from "../src/plan-file.js";
|
import { appendLog, counts, findGoal, parse, recordSignOff, setGoalStatus } from "../src/plan-file.js";
|
||||||
|
|
||||||
const SAMPLE = `# Goals: ship the cache layer
|
const SAMPLE = `# papers audit
|
||||||
|
|
||||||
## Goal: [/] Implement cache layer
|
Clean up steering/ metadata and kill empty dirs. Keep it read-only until I approve.
|
||||||
<!-- id: cache-layer-1 -->
|
|
||||||
done_when: p95 < 50ms on bench-X. If wrong: timeouts in load-test.log
|
|
||||||
verify: pytest tests/cache -q
|
|
||||||
failure_modes:
|
|
||||||
- cache silently bypassed (hit-rate ~0, latency ok by luck)
|
|
||||||
- bench too small to exercise eviction
|
|
||||||
- [x] wire cache client
|
|
||||||
- [ ] eviction policy
|
|
||||||
- [ ] load test
|
|
||||||
evidence:
|
|
||||||
- load-test.log shows p95=41ms
|
|
||||||
- hit-rate 0.93 in load-test.log
|
|
||||||
|
|
||||||
## Goal: [ ] Document the API
|
## Goals
|
||||||
<!-- id: document-the-api-1 -->
|
|
||||||
done_when: every public fn has a docstring; else sphinx warns
|
1. [/] goal: Implement cache layer
|
||||||
failure_modes:
|
- discriminator: hit-rate > 0.8 in load-test.log (a bypass reads ~0)
|
||||||
- docstrings exist but are stale
|
- subtle failure mode: cache silently bypassed, latency ok by luck
|
||||||
|
- verify: pytest tests/cache -q
|
||||||
|
- tasks:
|
||||||
|
1. [x] wire cache client
|
||||||
|
2. [/] eviction policy
|
||||||
|
3. ~~[ ]~~ distributed cache, out of scope
|
||||||
|
- evidence:
|
||||||
|
- > load-test.log: p95=41ms
|
||||||
|
- > hit-rate 0.93 (not bypassed)
|
||||||
|
2. [ ] goal: Document the API
|
||||||
|
- discriminator: every public fn has a docstring; sphinx warns on none
|
||||||
|
- subtle failure mode: docstrings exist but are stale
|
||||||
|
|
||||||
|
# Future work / out of scope
|
||||||
|
|
||||||
|
- distributed cache
|
||||||
|
|
||||||
## Log
|
## Log
|
||||||
- 2026-06-15 14:02 cache client wired; eviction next
|
- 2026-06-15 14:02 cache client wired; eviction next
|
||||||
@@ -49,92 +52,74 @@ function lineDelta(a: string, b: string): { added: number; removed: number } {
|
|||||||
describe("parse", () => {
|
describe("parse", () => {
|
||||||
const doc = parse(SAMPLE);
|
const doc = parse(SAMPLE);
|
||||||
|
|
||||||
it("reads the objective and both goals", () => {
|
it("reads the title and both goals (matched by subject)", () => {
|
||||||
expect(doc.objective).toBe("ship the cache layer");
|
expect(doc.title).toBe("papers audit");
|
||||||
expect(doc.goals.map((g) => g.id)).toEqual(["cache-layer-1", "document-the-api-1"]);
|
expect(doc.goals.map((g) => g.subject)).toEqual(["Implement cache layer", "Document the API"]);
|
||||||
});
|
});
|
||||||
|
|
||||||
it("reads goal fields, with status from the header checkbox", () => {
|
it("reads goal status from the checkbox", () => {
|
||||||
const g = findGoal(doc, "cache-layer-1");
|
expect(findGoal(doc, "Implement cache layer")?.status).toBe("active"); // [/]
|
||||||
expect(g?.subject).toBe("Implement cache layer");
|
expect(findGoal(doc, "Document the API")?.status).toBe("open"); // [ ]
|
||||||
expect(g?.status).toBe("active"); // from the [/] in the header
|
});
|
||||||
expect(g?.done_when).toBe("p95 < 50ms on bench-X. If wrong: timeouts in load-test.log");
|
|
||||||
|
it("reads discriminator, subtle failure mode, and verify as separate fields", () => {
|
||||||
|
const g = findGoal(doc, "Implement cache layer");
|
||||||
|
expect(g?.discriminator).toEqual(["hit-rate > 0.8 in load-test.log (a bypass reads ~0)"]);
|
||||||
|
expect(g?.failure_modes).toEqual(["cache silently bypassed, latency ok by luck"]);
|
||||||
expect(g?.verify).toBe("pytest tests/cache -q");
|
expect(g?.verify).toBe("pytest tests/cache -q");
|
||||||
expect(findGoal(doc, "document-the-api-1")?.status).toBe("open"); // from [ ]
|
|
||||||
});
|
});
|
||||||
|
|
||||||
it("separates failure_modes from subtasks", () => {
|
it("reads subtasks with their checkbox state, strikethrough as cancelled", () => {
|
||||||
const g = findGoal(doc, "cache-layer-1");
|
const g = findGoal(doc, "Implement cache layer");
|
||||||
expect(g?.failure_modes).toHaveLength(2);
|
|
||||||
expect(g?.failure_modes[0]).toContain("cache silently bypassed");
|
|
||||||
expect(g?.subtasks).toEqual([
|
expect(g?.subtasks).toEqual([
|
||||||
{ text: "wire cache client", done: true },
|
{ text: "wire cache client", status: "done" },
|
||||||
{ text: "eviction policy", done: false },
|
{ text: "eviction policy", status: "active" },
|
||||||
{ text: "load test", done: false },
|
{ text: "distributed cache, out of scope", status: "cancelled" },
|
||||||
]);
|
]);
|
||||||
});
|
});
|
||||||
|
|
||||||
it("reads the evidence block, separate from failure_modes and subtasks", () => {
|
it("reads the evidence block separate from the other lists", () => {
|
||||||
const g = findGoal(doc, "cache-layer-1");
|
const g = findGoal(doc, "Implement cache layer");
|
||||||
expect(g?.evidence).toEqual(["load-test.log shows p95=41ms", "hit-rate 0.93 in load-test.log"]);
|
expect(g?.evidence).toEqual(["> load-test.log: p95=41ms", "> hit-rate 0.93 (not bypassed)"]);
|
||||||
expect(g?.failure_modes).toHaveLength(2); // unchanged by the evidence block that follows the subtasks
|
expect(findGoal(doc, "Document the API")?.evidence).toEqual([]); // a goal with no evidence parses to []
|
||||||
const g2 = findGoal(doc, "document-the-api-1");
|
});
|
||||||
expect(g2?.evidence).toEqual([]); // a goal with no evidence block parses to []
|
|
||||||
|
it("keeps a multi-line evidence item together (quote + interpretation)", () => {
|
||||||
|
const doc2 = parse(
|
||||||
|
`# x\n\n## Goals\n\n1. [ ] goal: G\n - discriminator: report has non-zero counts\n - evidence:\n - > report.txt: counts 52 -> 4\n remaining 4 = index + 3 notes\n almost certain the discriminator passes\n - > second item, single line\n`,
|
||||||
|
);
|
||||||
|
expect(findGoal(doc2, "G")?.evidence).toEqual([
|
||||||
|
"> report.txt: counts 52 -> 4\nremaining 4 = index + 3 notes\nalmost certain the discriminator passes",
|
||||||
|
"> second item, single line",
|
||||||
|
]);
|
||||||
});
|
});
|
||||||
|
|
||||||
it("reads the log verbatim and counts by status", () => {
|
it("reads the log verbatim and counts by status", () => {
|
||||||
expect(doc.log).toEqual(["- 2026-06-15 14:02 cache client wired; eviction next"]);
|
expect(doc.log).toEqual(["- 2026-06-15 14:02 cache client wired; eviction next"]);
|
||||||
expect(counts(doc)).toEqual({ done: 0, open: 1, active: 1 });
|
expect(counts(doc)).toEqual({ done: 0, open: 1, active: 1 });
|
||||||
});
|
});
|
||||||
});
|
|
||||||
|
|
||||||
describe("failure_modes vs subtask disambiguation", () => {
|
it("ignores the Future work section, does not read it as goals or log", () => {
|
||||||
it("a column-0 checkbox right after failure_modes: is a SUBTASK", () => {
|
expect(doc.goals).toHaveLength(2);
|
||||||
const doc = parse(
|
expect(doc.log).toHaveLength(1);
|
||||||
`# Goals: x\n\n## Goal: [ ] G\n<!-- id: g-1 -->\ndone_when: z\nfailure_modes:\n- [ ] first subtask\n- [x] second subtask\n`,
|
|
||||||
);
|
|
||||||
const g = findGoal(doc, "g-1");
|
|
||||||
expect(g?.failure_modes).toEqual([]);
|
|
||||||
expect(g?.subtasks).toEqual([
|
|
||||||
{ text: "first subtask", done: false },
|
|
||||||
{ text: "second subtask", done: true },
|
|
||||||
]);
|
|
||||||
});
|
|
||||||
|
|
||||||
it("an indented checkbox-shaped item inside failure_modes is a FAILURE MODE", () => {
|
|
||||||
const doc = parse(
|
|
||||||
`# Goals: x\n\n## Goal: [ ] G\n<!-- id: g-2 -->\ndone_when: z\nfailure_modes:\n - [ ] prose that looks like a checkbox\n- [ ] real subtask\n`,
|
|
||||||
);
|
|
||||||
const g = findGoal(doc, "g-2");
|
|
||||||
expect(g?.failure_modes).toEqual(["[ ] prose that looks like a checkbox"]);
|
|
||||||
expect(g?.subtasks).toEqual([{ text: "real subtask", done: false }]);
|
|
||||||
});
|
|
||||||
|
|
||||||
it("a goal with no failure_modes keeps its subtasks", () => {
|
|
||||||
const doc = parse(`# Goals: x\n\n## Goal: [ ] G\n<!-- id: g-3 -->\ndone_when: z\n- [ ] only subtask\n`);
|
|
||||||
const g = findGoal(doc, "g-3");
|
|
||||||
expect(g?.failure_modes).toEqual([]);
|
|
||||||
expect(g?.subtasks).toEqual([{ text: "only subtask", done: false }]);
|
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
describe("the two CompleteGoal writes (minimal diff)", () => {
|
describe("the two CompleteGoal writes (minimal diff)", () => {
|
||||||
it("setGoalStatus replaces exactly one line, scoped to the right goal", () => {
|
it("setGoalStatus replaces exactly one line, scoped to the right goal", () => {
|
||||||
const next = setGoalStatus(SAMPLE, "cache-layer-1", "done");
|
const next = setGoalStatus(SAMPLE, "Implement cache layer", "done");
|
||||||
expect(lineDelta(SAMPLE, next)).toEqual({ added: 1, removed: 1 });
|
expect(lineDelta(SAMPLE, next)).toEqual({ added: 1, removed: 1 });
|
||||||
expect(findGoal(parse(next), "cache-layer-1")?.status).toBe("done");
|
expect(findGoal(parse(next), "Implement cache layer")?.status).toBe("done");
|
||||||
expect(findGoal(parse(next), "document-the-api-1")?.status).toBe("open"); // untouched
|
expect(findGoal(parse(next), "Document the API")?.status).toBe("open"); // untouched
|
||||||
});
|
});
|
||||||
|
|
||||||
it("setGoalStatus targets the second goal without touching the first", () => {
|
it("setGoalStatus keeps the number and goal: prefix, flips only the checkbox", () => {
|
||||||
const next = setGoalStatus(SAMPLE, "document-the-api-1", "active");
|
expect(setGoalStatus(SAMPLE, "Implement cache layer", "done")).toContain("1. [x] goal: Implement cache layer");
|
||||||
expect(findGoal(parse(next), "cache-layer-1")?.status).toBe("active");
|
expect(setGoalStatus(SAMPLE, "Document the API", "cancelled")).toContain("2. [-] goal: Document the API");
|
||||||
expect(findGoal(parse(next), "document-the-api-1")?.status).toBe("active");
|
|
||||||
});
|
});
|
||||||
|
|
||||||
it("setGoalStatus writes the checkbox char into the header line", () => {
|
it("setGoalStatus throws on an unknown subject", () => {
|
||||||
expect(setGoalStatus(SAMPLE, "cache-layer-1", "done")).toContain("## Goal: [x] Implement cache layer");
|
expect(() => setGoalStatus(SAMPLE, "no such goal", "done")).toThrow();
|
||||||
expect(setGoalStatus(SAMPLE, "document-the-api-1", "cancelled")).toContain("## Goal: [-] Document the API");
|
|
||||||
});
|
});
|
||||||
|
|
||||||
it("appendLog adds exactly one line under ## Log", () => {
|
it("appendLog adds exactly one line under ## Log", () => {
|
||||||
@@ -147,7 +132,7 @@ describe("the two CompleteGoal writes (minimal diff)", () => {
|
|||||||
});
|
});
|
||||||
|
|
||||||
it("appendLog creates the section when absent", () => {
|
it("appendLog creates the section when absent", () => {
|
||||||
const noLog = "# Goals: x\n\n## Goal: [ ] y\n<!-- id: y-1 -->\ndone_when: z\n";
|
const noLog = "# x\n\n## Goals\n\n1. [ ] goal: y\n - discriminator: z\n";
|
||||||
expect(parse(appendLog(noLog, "first entry")).log).toEqual(["- first entry"]);
|
expect(parse(appendLog(noLog, "first entry")).log).toEqual(["- first entry"]);
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
@@ -156,30 +141,30 @@ describe("recordSignOff (CompleteGoal's pure record logic)", () => {
|
|||||||
const WHEN = "2026-06-15 16:00";
|
const WHEN = "2026-06-15 16:00";
|
||||||
|
|
||||||
it("accept flips status:done and logs a sign-off line", () => {
|
it("accept flips status:done and logs a sign-off line", () => {
|
||||||
const r = recordSignOff(SAMPLE, "cache-layer-1", WHEN, { kind: "accepted" });
|
const r = recordSignOff(SAMPLE, "Implement cache layer", WHEN, { kind: "accepted" });
|
||||||
expect(r.isError).toBe(false);
|
expect(r.isError).toBe(false);
|
||||||
const doc = parse(r.content);
|
const doc = parse(r.content);
|
||||||
expect(findGoal(doc, "cache-layer-1")?.status).toBe("done");
|
expect(findGoal(doc, "Implement cache layer")?.status).toBe("done");
|
||||||
expect(doc.log.at(-1)).toBe(`- ${WHEN} signed off #cache-layer-1: Implement cache layer (oracle accept)`);
|
expect(doc.log.at(-1)).toBe(`- ${WHEN} signed off "Implement cache layer" (judge accept)`);
|
||||||
});
|
});
|
||||||
|
|
||||||
it("verify_failed only logs a reject line, status stays active", () => {
|
it("verify_failed only logs a reject line, status stays active", () => {
|
||||||
const r = recordSignOff(SAMPLE, "cache-layer-1", WHEN, { kind: "verify_failed", exitCode: 1, outputTail: "boom" });
|
const r = recordSignOff(SAMPLE, "Implement cache layer", WHEN, { kind: "verify_failed", exitCode: 1, outputTail: "boom" });
|
||||||
expect(r.isError).toBe(true);
|
expect(r.isError).toBe(true);
|
||||||
const doc = parse(r.content);
|
const doc = parse(r.content);
|
||||||
expect(findGoal(doc, "cache-layer-1")?.status).toBe("active"); // NOT marked done
|
expect(findGoal(doc, "Implement cache layer")?.status).toBe("active"); // NOT marked done
|
||||||
expect(doc.log.at(-1)).toBe(`- ${WHEN} reject #cache-layer-1: verify exit 1`);
|
expect(doc.log.at(-1)).toBe(`- ${WHEN} reject "Implement cache layer": verify exit 1`);
|
||||||
});
|
});
|
||||||
|
|
||||||
it("rejected logs the (one-lined) missing reason, status stays", () => {
|
it("rejected logs the (one-lined) missing reason, status stays", () => {
|
||||||
const r = recordSignOff(SAMPLE, "cache-layer-1", WHEN, { kind: "rejected", missing: "no\nsaved\nbench log" });
|
const r = recordSignOff(SAMPLE, "Implement cache layer", WHEN, { kind: "rejected", missing: "no\nsaved\nbench log" });
|
||||||
expect(r.isError).toBe(true);
|
expect(r.isError).toBe(true);
|
||||||
expect(findGoal(parse(r.content), "cache-layer-1")?.status).toBe("active");
|
expect(findGoal(parse(r.content), "Implement cache layer")?.status).toBe("active");
|
||||||
expect(parse(r.content).log.at(-1)).toBe(`- ${WHEN} reject #cache-layer-1: no saved bench log`);
|
expect(parse(r.content).log.at(-1)).toBe(`- ${WHEN} reject "Implement cache layer": no saved bench log`);
|
||||||
});
|
});
|
||||||
|
|
||||||
it("unknown goal returns an error and does not touch the file", () => {
|
it("unknown goal returns an error and does not touch the file", () => {
|
||||||
const r = recordSignOff(SAMPLE, "nope-1", WHEN, { kind: "accepted" });
|
const r = recordSignOff(SAMPLE, "nope", WHEN, { kind: "accepted" });
|
||||||
expect(r.isError).toBe(true);
|
expect(r.isError).toBe(true);
|
||||||
expect(r.content).toBe(SAMPLE);
|
expect(r.content).toBe(SAMPLE);
|
||||||
});
|
});
|
||||||
|
|||||||
Reference in New Issue
Block a user