pi-plan: fix crash on Ready->fresh-context; drop em-dashes in prompts

- startExecution: inside withSession, send via the ReplacedSessionContext
  (sessionCtx.sendUserMessage) and set the session name there. The old
  code used the global pi.* handle bound to the replaced session, which
  is stale after newSession (runner.assertActive) -> crash on the
  "fresh, compacted context" choice.
- prompts: replace em-dashes in model-facing strings with commas/
  semicolons/periods (humanizer pass; comments left as-is)

Co-Authored-By: Claudypoo <288921227+claudypoo@users.noreply.github.com>
This commit is contained in:
wassname
2026-06-15 20:32:30 +08:00
parent 861b2ea157
commit 3134adf203
2 changed files with 20 additions and 22 deletions
+8 -10
View File
@@ -202,17 +202,17 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
const doc = parse(readPlan(ctx));
const planFile = planPath(ctx);
const parentSession = ctx.sessionManager.getSessionFile();
const startMsg = `Work the plan in ${planFile}. Pick an open goal, set it active, work its subtasks, and when its done_when is met call CompleteGoal with the evidence. Keep plan.md current as you go.`;
exitPlanMode(ctx);
if (doc.objective) pi.setSessionName(`Plan: ${doc.objective}`);
if (fresh && savedCmdCtx) {
// After newSession, `ctx`/`pi` bound to the old session are stale — do post-swap work
// through the ReplacedSessionContext passed to withSession (see runner.assertActive).
const result = await savedCmdCtx.newSession({
parentSession,
withSession: async () => {
pi.sendUserMessage(
`Work the plan in ${planFile}. Pick an open goal, set it active, work its subtasks, and when its done_when is met call CompleteGoal with the evidence. Keep plan.md current as you go.`,
{ deliverAs: "followUp" },
);
withSession: async (sessionCtx) => {
if (doc.objective) pi.setSessionName(`Plan: ${doc.objective}`);
await sessionCtx.sendUserMessage(startMsg, { deliverAs: "followUp" });
},
});
if (result.cancelled) {
@@ -220,10 +220,8 @@ export default function piPlanExtension(pi: ExtensionAPI): void {
}
return;
}
pi.sendUserMessage(
`Work the plan in ${planFile}. Pick an open goal, set it active, work its subtasks, and when its done_when is met call CompleteGoal with the evidence. Keep plan.md current as you go.`,
{ deliverAs: "followUp" },
);
if (doc.objective) pi.setSessionName(`Plan: ${doc.objective}`);
pi.sendUserMessage(startMsg, { deliverAs: "followUp" });
}
// --- the one blessed tool: CompleteGoal -------------------------------------------------------
+12 -12
View File
@@ -37,7 +37,7 @@ export const planDrafting = `\
You are in plan mode. Explore the repository read-only, then draft a plan into plan.md.
Do not write or run code in this phase. Produce a plan the human will review and approve.
Right-size the plan don't force structure that isn't there:
Right-size the plan, don't force structure that isn't there:
- Default to ONE goal. Add another only when it's a genuinely separate checkpoint you'd want
signed off on its own (its own done_when that can pass or fail independently). A long list of
near-identical goals should be one goal with subtasks. Most objectives are 1-2 goals.
@@ -49,20 +49,20 @@ Write each goal in this shape:
## Goal: <one short imperative line>
status: open
done_when: <one falsifiable check what is true on disk when this is done>
done_when: <one falsifiable check; what is true on disk when this is done>
verify: <optional shell command that exits 0 only when done_when holds; omit if not testable>
failure_modes:
- <a sneaky way this could look done but isn't terse, optional>
- <a sneaky way this could look done but isn't; terse, optional>
- [ ] <subtask>
- [ ] <subtask>
Keep it lean:
- done_when is ONE concrete, checkable condition not a paragraph, no "if wrong" clause.
- done_when is ONE concrete, checkable condition, not a paragraph, no "if wrong" clause.
The symptom of failure goes in failure_modes, not here.
- failure_modes: 0-2 terse items, only the non-obvious ways a "done" could be wrong (a
pre-mortem). If you add a verify command, one mode can be "verify passes on a gamed file".
- subtasks: a short checklist of the real steps; omit them if the goal is a single action.
- Prefer a verify command when success is a test/build/threshold — a green check beats prose.
- Prefer a verify command when success is a test/build/threshold. A green check beats prose.
When the plan is drafted, present it and stop for review. Do not begin execution.`;
@@ -108,9 +108,9 @@ export const reminder = `\
<system-reminder>
Keep plan.md current as you work:
- tasks: tick the subtasks you've finished; add any new ones you've discovered.
- log: append ONE short line to ## Log (append don't rewrite earlier lines).
- log: append ONE short line to ## Log (append, don't rewrite earlier lines).
- goal: if the active goal's evidence is in, sign it off by calling CompleteGoal with that
evidence. Don't edit status to done by hand CompleteGoal runs the check and records it.
evidence. Don't edit status to done by hand; CompleteGoal runs the check and records it.
- otherwise: keep working toward the active goal. Don't stop to ask unless you're genuinely
blocked; if blocked, say what's blocking and why.
</system-reminder>`;
@@ -123,7 +123,7 @@ Keep plan.md current as you work:
* ──────────────────────────────────────────────────────────────────────── */
export const continuation = `\
Continue toward the active goal in plan.md. If it now meets its done_when, call CompleteGoal
with your evidence (point to durable artifacts saved logs, committed diffs, files not just
with your evidence (point to durable artifacts: saved logs, committed diffs, files, not just
claims). If you're blocked, state what's blocking it.`;
/* ─────────────────────────────────────────────────────────────────────────
@@ -137,7 +137,7 @@ claims). If you're blocked, state what's blocking it.`;
export const loopJudgeSystem = `\
You decide whether an autonomous coding agent should keep working or pause for the human.
Be conservative: only pause when the work is plainly finished or plainly blocked. When in
doubt, continue. You are not verifying correctness a later read-only judge does that.
doubt, continue. You are not verifying correctness; a later read-only judge does that.
Reply with ONLY a JSON object, no other text: {"done": boolean, "reason": "<one sentence>"}.
Set done=true only if the agent's last message shows the active goal's done_when is met, or
the agent says it is blocked and needs the human.`;
@@ -167,11 +167,11 @@ ${p.lastResponse}
* contract — the oracle returns prose by default, so parse the VERDICT line.
* ──────────────────────────────────────────────────────────────────────── */
export const evidenceJudgeSystem = `\
You are a read-only reviewer signing off a coding goal. Do not trust claims verify.
You are a read-only reviewer signing off a coding goal. Do not trust claims; verify.
Use read/grep/find/ls to inspect the repository and the cited artifacts yourself. Re-read the
files, logs, and diffs the evidence points to; if something it asserts isn't on disk, you can't
confirm it. If a verify command was run, judge whether it genuinely tests the criterion or
could pass while one of the listed failure modes still holds a tautological or skipped test
could pass while one of the listed failure modes still holds; a tautological or skipped test
is a reject. Check each failure mode is actually ruled out, not just unmentioned.
Finish with exactly these two lines and nothing after:
@@ -202,7 +202,7 @@ Agent's evidence:
${p.evidence}
Artifacts it points to (inspect these):
${p.paths.map((x) => ` - ${x}`).join("\n") || " (none listed note this)"}
${p.paths.map((x) => ` - ${x}`).join("\n") || " (none listed, note this)"}
Verify the goal against its done_when. Then give your VERDICT.`;
}