mirror of
https://github.com/wassname/pi-lgtm.git
synced 2026-06-27 17:01:35 +08:00
feat: iterate and auto-run robot reviews
This commit is contained in:
@@ -6,7 +6,7 @@ A [pi](https://pi.dev) extension that adds structured human sign-off to task tra
|
|||||||
|
|
||||||
The core idea: agents cannot mark tasks complete themselves. They must call `lgtm_ask` with auditable evidence and explicit failure-mode analysis, then a human signs off via `/lgtm <id>`.
|
The core idea: agents cannot mark tasks complete themselves. They must call `lgtm_ask` with auditable evidence and explicit failure-mode analysis, then a human signs off via `/lgtm <id>`.
|
||||||
|
|
||||||
Tasks can also carry a separate fresh-perspective robot review from a subagent or other model family. That review is observational only and never completes the task.
|
Tasks can also carry a separate fresh-perspective robot review from a subagent or other model family. Robot reviews can iterate: if the latest review says the evidence is incomplete or unconvincing, human sign-off is held back until the agent strengthens the evidence and reruns review.
|
||||||
|
|
||||||
## Install
|
## Install
|
||||||
|
|
||||||
@@ -46,7 +46,7 @@ Stripped: `TaskExecute`, `TaskOutput`, `TaskStop`, `process-tracker.ts`, subagen
|
|||||||
Badges:
|
Badges:
|
||||||
|
|
||||||
- `🛠` tool evidence attached via `lgtm_ask`
|
- `🛠` tool evidence attached via `lgtm_ask`
|
||||||
- `🤖` robot review attached via `robot_review_ask`
|
- `🤖` one or more robot review iterations attached
|
||||||
- `👀` pending human sign-off via `/lgtm`
|
- `👀` pending human sign-off via `/lgtm`
|
||||||
|
|
||||||
## Tools
|
## Tools
|
||||||
@@ -89,6 +89,8 @@ After calling this, the task shows `👀` and is only completable via `/lgtm <id
|
|||||||
|
|
||||||
The tool result includes a non-blocking self-check prompt asking whether the evidence directly addresses the `done_criterion` and whether a skeptical reviewer would find it convincing.
|
The tool result includes a non-blocking self-check prompt asking whether the evidence directly addresses the `done_criterion` and whether a skeptical reviewer would find it convincing.
|
||||||
|
|
||||||
|
`lgtm_ask` also accepts `run_robot_review` (optional). If true, or if `PI_LGTM_AUTO_ROBOT_REVIEW=1`, the extension runs the configured robot reviewer immediately after storing evidence. A failing robot review clears `pending_approval` until the evidence is strengthened.
|
||||||
|
|
||||||
### `robot_review_ask`
|
### `robot_review_ask`
|
||||||
|
|
||||||
Attach a fresh-perspective robot review to a task.
|
Attach a fresh-perspective robot review to a task.
|
||||||
@@ -102,8 +104,30 @@ Required fields:
|
|||||||
| `scope` | What the reviewer inspected |
|
| `scope` | What the reviewer inspected |
|
||||||
| `observations` | Concrete observations only. No advice, verdicts, or editorial |
|
| `observations` | Concrete observations only. No advice, verdicts, or editorial |
|
||||||
| `blind_spots` | What the reviewer did not inspect or could not verify |
|
| `blind_spots` | What the reviewer did not inspect or could not verify |
|
||||||
|
| `evidence_complete` | Whether the supplied evidence actually covers the done criterion |
|
||||||
|
| `evidence_convincing` | Whether the supplied evidence would convince a skeptical reviewer |
|
||||||
|
| `missing_evidence` | Concrete missing checks or artifacts needed before human sign-off |
|
||||||
|
|
||||||
Use this from a separate subagent or other model when possible. The review is additive: it shows up as `🤖`, is visible in task detail and `/lgtm`, and does not complete the task.
|
Use this from a separate subagent or other model when possible. Reviews append as iterations; the latest one is what gates human sign-off.
|
||||||
|
|
||||||
|
### `robot_review_run`
|
||||||
|
|
||||||
|
Run the configured automatic robot reviewer against the current task evidence.
|
||||||
|
|
||||||
|
Default reviewer command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
acpx --approve-reads --non-interactive-permissions deny opencode exec
|
||||||
|
```
|
||||||
|
|
||||||
|
Override with:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
PI_LGTM_ROBOT_REVIEW_CMD='acpx --approve-reads --non-interactive-permissions deny codex exec'
|
||||||
|
PI_LGTM_AUTO_ROBOT_REVIEW=1
|
||||||
|
```
|
||||||
|
|
||||||
|
This appends a new robot-review iteration. If the latest robot review sets `evidence_complete=false` or `evidence_convincing=false`, `/lgtm` is blocked until stronger evidence is submitted and reviewed again.
|
||||||
|
|
||||||
## Commands
|
## Commands
|
||||||
|
|
||||||
@@ -118,7 +142,11 @@ Interactive menu: view tasks, create task, clear completed/all.
|
|||||||
## Task lifecycle
|
## Task lifecycle
|
||||||
|
|
||||||
```
|
```
|
||||||
pending -> in_progress -> (lgtm_ask) -> pending_approval 👀 -> (/lgtm) -> completed
|
pending -> in_progress -> (lgtm_ask)
|
||||||
|
-> robot review iteration(s) 🤖
|
||||||
|
-> pending_approval 👀 if latest robot review passes or no robot review is required
|
||||||
|
-> strengthen evidence + rerun review if latest robot review fails
|
||||||
|
-> (/lgtm) -> completed
|
||||||
-> deleted
|
-> deleted
|
||||||
```
|
```
|
||||||
|
|
||||||
@@ -145,8 +173,9 @@ PI_TASKS_DEBUG=1 # trace to stderr
|
|||||||
|
|
||||||
```
|
```
|
||||||
src/
|
src/
|
||||||
├── index.ts # 6 tools + /tasks + /lgtm commands + widget + event handlers
|
├── index.ts # 7 tools + /tasks + /lgtm commands + widget + event handlers
|
||||||
├── review-badges.ts # Review badge helpers for tool/robot/human lanes
|
├── review-badges.ts # Review badge helpers for tool/robot/human lanes
|
||||||
|
├── robot-review.ts # Robot review iteration storage + compatibility helpers
|
||||||
├── types.ts # Task, TaskStatus types
|
├── types.ts # Task, TaskStatus types
|
||||||
├── task-store.ts # File-backed store with CRUD, locking, complete() method
|
├── task-store.ts # File-backed store with CRUD, locking, complete() method
|
||||||
├── auto-clear.ts # Turn-based auto-clearing of completed tasks
|
├── auto-clear.ts # Turn-based auto-clearing of completed tasks
|
||||||
|
|||||||
+244
-25
@@ -14,11 +14,18 @@
|
|||||||
* /lgtm <id> — Human signs off on a task (only way to complete)
|
* /lgtm <id> — Human signs off on a task (only way to complete)
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
import { spawn } from "node:child_process";
|
||||||
import { join, resolve } from "node:path";
|
import { join, resolve } from "node:path";
|
||||||
import type { ExtensionAPI, ExtensionCommandContext, ExtensionContext } from "@mariozechner/pi-coding-agent";
|
import type { ExtensionAPI, ExtensionCommandContext, ExtensionContext } from "@mariozechner/pi-coding-agent";
|
||||||
import { Type } from "@sinclair/typebox";
|
import { Type } from "@sinclair/typebox";
|
||||||
import { AutoClearManager } from "./auto-clear.js";
|
import { AutoClearManager } from "./auto-clear.js";
|
||||||
import { getReviewBadges, REVIEW_BADGES } from "./review-badges.js";
|
import { getReviewBadges, REVIEW_BADGES } from "./review-badges.js";
|
||||||
|
import {
|
||||||
|
appendRobotReviewMetadata,
|
||||||
|
getRobotReviews,
|
||||||
|
latestRobotReviewPasses,
|
||||||
|
type RobotReviewRecord,
|
||||||
|
} from "./robot-review.js";
|
||||||
import { TaskStore } from "./task-store.js";
|
import { TaskStore } from "./task-store.js";
|
||||||
import { loadTasksConfig } from "./tasks-config.js";
|
import { loadTasksConfig } from "./tasks-config.js";
|
||||||
import { TaskWidget, type UICtx } from "./ui/task-widget.js";
|
import { TaskWidget, type UICtx } from "./ui/task-widget.js";
|
||||||
@@ -27,10 +34,129 @@ function textResult(msg: string) {
|
|||||||
return { content: [{ type: "text" as const, text: msg }], details: undefined as any };
|
return { content: [{ type: "text" as const, text: msg }], details: undefined as any };
|
||||||
}
|
}
|
||||||
|
|
||||||
const TASK_TOOL_NAMES = new Set(["TaskCreate", "TaskList", "TaskGet", "TaskUpdate", "lgtm_ask", "robot_review_ask"]);
|
const TASK_TOOL_NAMES = new Set(["TaskCreate", "TaskList", "TaskGet", "TaskUpdate", "lgtm_ask", "robot_review_ask", "robot_review_run"]);
|
||||||
const REMINDER_INTERVAL = 4;
|
const REMINDER_INTERVAL = 4;
|
||||||
const AUTO_CLEAR_DELAY = 4;
|
const AUTO_CLEAR_DELAY = 4;
|
||||||
|
|
||||||
|
type CommandResult = { stdout: string; stderr: string; exitCode: number | null };
|
||||||
|
|
||||||
|
function shellQuote(text: string): string {
|
||||||
|
return JSON.stringify(text);
|
||||||
|
}
|
||||||
|
|
||||||
|
function runShellCommand(command: string, signal?: AbortSignal): Promise<CommandResult> {
|
||||||
|
return new Promise((resolve, reject) => {
|
||||||
|
const child = spawn("bash", ["-lc", command], { stdio: ["ignore", "pipe", "pipe"] });
|
||||||
|
const stdoutChunks: Buffer[] = [];
|
||||||
|
const stderrChunks: Buffer[] = [];
|
||||||
|
child.stdout.on("data", (data) => stdoutChunks.push(data));
|
||||||
|
child.stderr.on("data", (data) => stderrChunks.push(data));
|
||||||
|
child.on("error", reject);
|
||||||
|
const onAbort = () => child.kill();
|
||||||
|
signal?.addEventListener("abort", onAbort, { once: true });
|
||||||
|
child.on("close", (exitCode) => {
|
||||||
|
signal?.removeEventListener("abort", onAbort);
|
||||||
|
if (signal?.aborted) {
|
||||||
|
reject(new Error("aborted"));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
resolve({
|
||||||
|
stdout: Buffer.concat(stdoutChunks).toString("utf-8"),
|
||||||
|
stderr: Buffer.concat(stderrChunks).toString("utf-8"),
|
||||||
|
exitCode,
|
||||||
|
});
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
function extractRobotReviewJson(output: string): Record<string, unknown> {
|
||||||
|
const match = output.match(/ROBOT_REVIEW_JSON_START\s*([\s\S]*?)\s*ROBOT_REVIEW_JSON_END/);
|
||||||
|
if (!match) throw new Error("Robot reviewer did not return the expected JSON markers.");
|
||||||
|
return JSON.parse(match[1]) as Record<string, unknown>;
|
||||||
|
}
|
||||||
|
|
||||||
|
function formatRobotReview(review: RobotReviewRecord): string {
|
||||||
|
const parts = [
|
||||||
|
`Robot review #${review.iteration} (${review.submitted_at})`,
|
||||||
|
`Reviewer: ${review.reviewer}${review.mode === "auto" ? " [auto]" : ""}`,
|
||||||
|
`Scope: ${review.scope}`,
|
||||||
|
`Evidence complete: ${review.evidence_complete ? "yes" : "no"}`,
|
||||||
|
`Evidence convincing: ${review.evidence_convincing ? "yes" : "no"}`,
|
||||||
|
`Observations:\n- ${review.observations.join("\n- ")}`,
|
||||||
|
];
|
||||||
|
if (review.missing_evidence.length > 0) parts.push(`Missing evidence:\n- ${review.missing_evidence.join("\n- ")}`);
|
||||||
|
if (review.blind_spots) parts.push(`Blind spots: ${review.blind_spots}`);
|
||||||
|
return parts.join("\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildRobotReviewPrompt(task: any): string {
|
||||||
|
const priorReviews = getRobotReviews(task);
|
||||||
|
const priorSection = priorReviews.length > 0
|
||||||
|
? `\nPrevious robot reviews:\n${priorReviews.map(formatRobotReview).join("\n\n")}\n`
|
||||||
|
: "\nPrevious robot reviews:\n(none)\n";
|
||||||
|
return [
|
||||||
|
"Review the task evidence with a fresh perspective.",
|
||||||
|
"Observations should stay concrete and source-grounded.",
|
||||||
|
"Set evidence_complete=false if the supplied evidence does not cover the claimed done criterion.",
|
||||||
|
"Set evidence_convincing=false if the evidence exists but would not convince a skeptical reviewer.",
|
||||||
|
"Return exactly one JSON object between the markers ROBOT_REVIEW_JSON_START and ROBOT_REVIEW_JSON_END.",
|
||||||
|
"JSON schema:",
|
||||||
|
'{"reviewer":"string","scope":"string","observations":["string"],"blind_spots":"string","evidence_complete":true,"evidence_convincing":true,"missing_evidence":["string"]}',
|
||||||
|
"",
|
||||||
|
`Task #${task.id}: ${task.subject}`,
|
||||||
|
`Done criterion: ${task.done_criterion}`,
|
||||||
|
`Description: ${task.description}`,
|
||||||
|
"",
|
||||||
|
"Evidence package:",
|
||||||
|
`Evidence: ${task.metadata?.lgtm_evidence ?? "(missing)"}`,
|
||||||
|
`Failure likely: ${task.metadata?.lgtm_failure_likely ?? "(missing)"}`,
|
||||||
|
`Failure sneaky: ${task.metadata?.lgtm_failure_sneaky ?? "(missing)"}`,
|
||||||
|
`Falsification test: ${task.metadata?.lgtm_falsification_test ?? "(missing)"}`,
|
||||||
|
`Verification hints: ${Array.isArray(task.metadata?.lgtm_verification_hints) ? task.metadata.lgtm_verification_hints.join(" | ") : "(missing)"}`,
|
||||||
|
`Remaining uncertainty: ${task.metadata?.lgtm_remaining_uncertainty ?? "(missing)"}`,
|
||||||
|
priorSection,
|
||||||
|
"Output format:",
|
||||||
|
"ROBOT_REVIEW_JSON_START",
|
||||||
|
'{"reviewer":"...","scope":"...","observations":["..."],"blind_spots":"...","evidence_complete":true,"evidence_convincing":true,"missing_evidence":["..."]}',
|
||||||
|
"ROBOT_REVIEW_JSON_END",
|
||||||
|
].join("\n");
|
||||||
|
}
|
||||||
|
|
||||||
|
async function runAutomaticRobotReview(
|
||||||
|
task: any,
|
||||||
|
signal?: AbortSignal,
|
||||||
|
): Promise<{ review: Omit<RobotReviewRecord, "iteration">; command: string }> {
|
||||||
|
const reviewerCommand = process.env.PI_LGTM_ROBOT_REVIEW_CMD?.trim()
|
||||||
|
|| "acpx --approve-reads --non-interactive-permissions deny opencode exec";
|
||||||
|
const prompt = buildRobotReviewPrompt(task);
|
||||||
|
const command = `${reviewerCommand} ${shellQuote(prompt)}`;
|
||||||
|
const result = await runShellCommand(command, signal);
|
||||||
|
if (result.exitCode !== 0) {
|
||||||
|
throw new Error(`Robot reviewer failed (${result.exitCode ?? "?"}): ${(result.stderr || result.stdout).trim()}`);
|
||||||
|
}
|
||||||
|
const parsed = extractRobotReviewJson(result.stdout);
|
||||||
|
const observations = Array.isArray(parsed.observations) ? parsed.observations.filter((item): item is string => typeof item === "string") : [];
|
||||||
|
if (observations.length === 0) throw new Error("Robot reviewer returned no observations.");
|
||||||
|
const missing_evidence = Array.isArray(parsed.missing_evidence)
|
||||||
|
? parsed.missing_evidence.filter((item): item is string => typeof item === "string")
|
||||||
|
: [];
|
||||||
|
return {
|
||||||
|
command: reviewerCommand,
|
||||||
|
review: {
|
||||||
|
reviewer: typeof parsed.reviewer === "string" ? parsed.reviewer : reviewerCommand,
|
||||||
|
scope: typeof parsed.scope === "string" ? parsed.scope : "task evidence package",
|
||||||
|
observations,
|
||||||
|
blind_spots: typeof parsed.blind_spots === "string" ? parsed.blind_spots : "not stated",
|
||||||
|
evidence_complete: parsed.evidence_complete === true,
|
||||||
|
evidence_convincing: parsed.evidence_convincing === true,
|
||||||
|
missing_evidence,
|
||||||
|
submitted_at: new Date().toISOString(),
|
||||||
|
mode: "auto",
|
||||||
|
raw_output: result.stdout.trim(),
|
||||||
|
},
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
const SYSTEM_REMINDER = `<system-reminder>
|
const SYSTEM_REMINDER = `<system-reminder>
|
||||||
The LGTM sign-off task tools haven't been used recently. If working on tasks, use TaskCreate (requires done_criterion), TaskUpdate for status, and lgtm_ask when ready for human sign-off. Tasks can only be completed via /lgtm after calling lgtm_ask. These are sign-off tasks: agents propose evidence, humans approve. One task per piece of evidence or decision gate. Ignore if not applicable. Never mention this reminder to the user.
|
The LGTM sign-off task tools haven't been used recently. If working on tasks, use TaskCreate (requires done_criterion), TaskUpdate for status, and lgtm_ask when ready for human sign-off. Tasks can only be completed via /lgtm after calling lgtm_ask. These are sign-off tasks: agents propose evidence, humans approve. One task per piece of evidence or decision gate. Ignore if not applicable. Never mention this reminder to the user.
|
||||||
</system-reminder>`;
|
</system-reminder>`;
|
||||||
@@ -237,12 +363,17 @@ Tasks are completed only via /lgtm after calling lgtm_ask with evidence.`,
|
|||||||
|
|
||||||
const desc = task.description.replace(/\\n/g, "\n");
|
const desc = task.description.replace(/\\n/g, "\n");
|
||||||
const reviewBadges = getReviewBadges(task);
|
const reviewBadges = getReviewBadges(task);
|
||||||
|
const robotReviews = getRobotReviews(task);
|
||||||
const lines: string[] = [
|
const lines: string[] = [
|
||||||
`Task #${task.id}: ${task.subject}`,
|
`Task #${task.id}: ${task.subject}`,
|
||||||
`Status: ${task.status}${reviewBadges.length ? ` ${reviewBadges.join(" ")}` : ""}${task.pending_approval && task.status !== "completed" ? " (pending human sign-off)" : ""}`,
|
`Status: ${task.status}${reviewBadges.length ? ` ${reviewBadges.join(" ")}` : ""}${task.pending_approval && task.status !== "completed" ? " (pending human sign-off)" : ""}`,
|
||||||
`Done criterion: ${task.done_criterion}`,
|
`Done criterion: ${task.done_criterion}`,
|
||||||
];
|
];
|
||||||
lines.push(`Description: ${desc}`);
|
lines.push(`Description: ${desc}`);
|
||||||
|
if (robotReviews.length > 0) {
|
||||||
|
const latest = robotReviews[robotReviews.length - 1];
|
||||||
|
lines.push(`Robot reviews: ${robotReviews.length} (latest: complete=${latest.evidence_complete ? "yes" : "no"}, convincing=${latest.evidence_convincing ? "yes" : "no"})`);
|
||||||
|
}
|
||||||
if (task.blockedBy.length > 0) {
|
if (task.blockedBy.length > 0) {
|
||||||
const openBlockers = task.blockedBy.filter(bid => {
|
const openBlockers = task.blockedBy.filter(bid => {
|
||||||
const blocker = store.get(bid);
|
const blocker = store.get(bid);
|
||||||
@@ -345,9 +476,10 @@ After this, task enters pending sign-off state — only completable via /lgtm <i
|
|||||||
falsification_test: Type.String({ description: "What you ran and what you got, presented so both you and the human can sanity-check it. State: what you ran (command/experiment/log check), the actual output or result, and why that result could not occur if a failure mode were real. Must be traceable: include file paths, log snippets, counts, or commit. The human should be able to verify without re-running anything." }),
|
falsification_test: Type.String({ description: "What you ran and what you got, presented so both you and the human can sanity-check it. State: what you ran (command/experiment/log check), the actual output or result, and why that result could not occur if a failure mode were real. Must be traceable: include file paths, log snippets, counts, or commit. The human should be able to verify without re-running anything." }),
|
||||||
verification_hints: Type.Array(Type.String(), { description: "Where to look and what to check. Descriptions of evidence locations, not bare file paths. E.g. 'lines 45-60 in src/loss.py show the gradient check' not 'src/loss.py'." }),
|
verification_hints: Type.Array(Type.String(), { description: "Where to look and what to check. Descriptions of evidence locations, not bare file paths. E.g. 'lines 45-60 in src/loss.py show the gradient check' not 'src/loss.py'." }),
|
||||||
remaining_uncertainty: Type.String({ description: "What's NOT tested, known limitations, edge cases deferred. If you can't articulate uncertainty, you haven't thought hard enough." }),
|
remaining_uncertainty: Type.String({ description: "What's NOT tested, known limitations, edge cases deferred. If you can't articulate uncertainty, you haven't thought hard enough." }),
|
||||||
|
run_robot_review: Type.Optional(Type.Boolean({ description: "If true, run the configured automatic robot reviewer immediately after storing evidence." })),
|
||||||
}),
|
}),
|
||||||
|
|
||||||
execute(_toolCallId, params, _signal, _onUpdate, _ctx) {
|
async execute(_toolCallId, params, signal, _onUpdate, _ctx) {
|
||||||
const task = store.get(params.taskId);
|
const task = store.get(params.taskId);
|
||||||
if (!task) return Promise.resolve(textResult(`Task #${params.taskId} not found`));
|
if (!task) return Promise.resolve(textResult(`Task #${params.taskId} not found`));
|
||||||
if (task.status === "completed") return Promise.resolve(textResult(`Task #${params.taskId} already completed`));
|
if (task.status === "completed") return Promise.resolve(textResult(`Task #${params.taskId} already completed`));
|
||||||
@@ -366,6 +498,36 @@ After this, task enters pending sign-off state — only completable via /lgtm <i
|
|||||||
lgtm_submitted_at: new Date().toISOString(),
|
lgtm_submitted_at: new Date().toISOString(),
|
||||||
},
|
},
|
||||||
});
|
});
|
||||||
|
let robotReviewNote = "";
|
||||||
|
const shouldRunRobotReview = params.run_robot_review ?? process.env.PI_LGTM_AUTO_ROBOT_REVIEW === "1";
|
||||||
|
if (shouldRunRobotReview) {
|
||||||
|
const refreshedTask = store.get(params.taskId);
|
||||||
|
if (!refreshedTask) return textResult(`Task #${params.taskId} not found after evidence update`);
|
||||||
|
try {
|
||||||
|
const { review, command } = await runAutomaticRobotReview(refreshedTask, signal);
|
||||||
|
store.update(params.taskId, {
|
||||||
|
pending_approval: review.evidence_complete && review.evidence_convincing,
|
||||||
|
metadata: appendRobotReviewMetadata(refreshedTask, review),
|
||||||
|
});
|
||||||
|
robotReviewNote =
|
||||||
|
`\n\n### Automatic robot review\n` +
|
||||||
|
`Reviewer command: ${command}\n` +
|
||||||
|
`Evidence complete: ${review.evidence_complete ? "yes" : "no"}\n` +
|
||||||
|
`Evidence convincing: ${review.evidence_convincing ? "yes" : "no"}\n` +
|
||||||
|
`${review.observations.map(o => `- ${o}`).join("\n")}`;
|
||||||
|
if (review.missing_evidence.length > 0) {
|
||||||
|
robotReviewNote += `\nMissing evidence:\n${review.missing_evidence.map(item => `- ${item}`).join("\n")}`;
|
||||||
|
}
|
||||||
|
if (!(review.evidence_complete && review.evidence_convincing)) {
|
||||||
|
robotReviewNote += `\nResult: human sign-off has been held back until the evidence is strengthened and reviewed again.`;
|
||||||
|
}
|
||||||
|
} catch (err: any) {
|
||||||
|
robotReviewNote =
|
||||||
|
`\n\n### Automatic robot review\n` +
|
||||||
|
`Reviewer failed: ${err.message}\n` +
|
||||||
|
`Task remains pending human sign-off; rerun with stronger evidence or call \`robot_review_run\` after fixing reviewer setup.`;
|
||||||
|
}
|
||||||
|
}
|
||||||
widget.update();
|
widget.update();
|
||||||
|
|
||||||
const hintsSection = params.verification_hints?.length
|
const hintsSection = params.verification_hints?.length
|
||||||
@@ -384,14 +546,15 @@ After this, task enters pending sign-off state — only completable via /lgtm <i
|
|||||||
`### Falsification test\n${params.falsification_test}` +
|
`### Falsification test\n${params.falsification_test}` +
|
||||||
hintsSection +
|
hintsSection +
|
||||||
uncertaintySection +
|
uncertaintySection +
|
||||||
|
robotReviewNote +
|
||||||
`\n\n---\n` +
|
`\n\n---\n` +
|
||||||
`Task #${task.id} is now pending human sign-off via \`/lgtm ${task.id}\`.\n\n` +
|
`Task #${task.id} is now ${shouldRunRobotReview && !store.get(task.id)?.pending_approval ? "not yet ready for human sign-off" : `pending human sign-off via \`/lgtm ${task.id}\``}.\n\n` +
|
||||||
`**Self-check (non-blocking):** Look at this as the human will see it. ` +
|
`**Self-check (non-blocking):** Look at this as the human will see it. ` +
|
||||||
`Does the evidence directly address the done_criterion "${task.done_criterion}"? ` +
|
`Does the evidence directly address the done_criterion "${task.done_criterion}"? ` +
|
||||||
`Would a skeptical reviewer find this convincing, or would they immediately ask ` +
|
`Would a skeptical reviewer find this convincing, or would they immediately ask ` +
|
||||||
`"but what about..."? If evidence feels thin, call lgtm_ask again with stronger evidence.`;
|
`"but what about..."? If evidence feels thin, call lgtm_ask again with stronger evidence.`;
|
||||||
|
|
||||||
return Promise.resolve(textResult(result));
|
return textResult(result);
|
||||||
},
|
},
|
||||||
});
|
});
|
||||||
|
|
||||||
@@ -401,7 +564,7 @@ After this, task enters pending sign-off state — only completable via /lgtm <i
|
|||||||
description: `Attach fresh-perspective robot review observations to a task.
|
description: `Attach fresh-perspective robot review observations to a task.
|
||||||
|
|
||||||
Use this from a separate subagent or model when possible, ideally from a different model family/class than the implementation agent.
|
Use this from a separate subagent or model when possible, ideally from a different model family/class than the implementation agent.
|
||||||
Observations only: report what you saw, not advice, verdicts, prioritization, or editorial.
|
Observations only: report what you saw, not advice or editorial. Structured gate fields record whether the evidence is complete and convincing enough to advance.
|
||||||
|
|
||||||
This does not complete the task. Human /lgtm remains the only completion path.`,
|
This does not complete the task. Human /lgtm remains the only completion path.`,
|
||||||
parameters: Type.Object({
|
parameters: Type.Object({
|
||||||
@@ -413,6 +576,9 @@ This does not complete the task. Human /lgtm remains the only completion path.`,
|
|||||||
description: "Observations only. Concrete things noticed in the artifacts. No recommendations, interpretation, or editorial.",
|
description: "Observations only. Concrete things noticed in the artifacts. No recommendations, interpretation, or editorial.",
|
||||||
}),
|
}),
|
||||||
blind_spots: Type.String({ description: "What the reviewer did not inspect or could not verify" }),
|
blind_spots: Type.String({ description: "What the reviewer did not inspect or could not verify" }),
|
||||||
|
evidence_complete: Type.Boolean({ description: "Whether the supplied evidence covers the claimed done criterion." }),
|
||||||
|
evidence_convincing: Type.Boolean({ description: "Whether the supplied evidence would convince a skeptical reviewer." }),
|
||||||
|
missing_evidence: Type.Optional(Type.Array(Type.String(), { description: "Concrete missing checks, artifacts, or observations needed before human sign-off." })),
|
||||||
}),
|
}),
|
||||||
|
|
||||||
execute(_toolCallId, params, _signal, _onUpdate, _ctx) {
|
execute(_toolCallId, params, _signal, _onUpdate, _ctx) {
|
||||||
@@ -421,21 +587,32 @@ This does not complete the task. Human /lgtm remains the only completion path.`,
|
|||||||
if (task.status === "completed") return Promise.resolve(textResult(`Task #${params.taskId} already completed`));
|
if (task.status === "completed") return Promise.resolve(textResult(`Task #${params.taskId} already completed`));
|
||||||
|
|
||||||
store.update(params.taskId, {
|
store.update(params.taskId, {
|
||||||
|
pending_approval: params.evidence_complete && params.evidence_convincing ? task.pending_approval : false,
|
||||||
metadata: {
|
metadata: {
|
||||||
robot_review_reviewer: params.reviewer,
|
...appendRobotReviewMetadata(task, {
|
||||||
robot_review_scope: params.scope,
|
reviewer: params.reviewer,
|
||||||
robot_review_observations: params.observations,
|
scope: params.scope,
|
||||||
robot_review_blind_spots: params.blind_spots,
|
observations: params.observations,
|
||||||
robot_review_submitted_at: new Date().toISOString(),
|
blind_spots: params.blind_spots,
|
||||||
|
evidence_complete: params.evidence_complete,
|
||||||
|
evidence_convincing: params.evidence_convincing,
|
||||||
|
missing_evidence: params.missing_evidence ?? [],
|
||||||
|
submitted_at: new Date().toISOString(),
|
||||||
|
mode: "manual",
|
||||||
|
}),
|
||||||
},
|
},
|
||||||
});
|
});
|
||||||
widget.update();
|
widget.update();
|
||||||
|
|
||||||
const result =
|
const result =
|
||||||
`## Robot review attached to task #${task.id}: ${task.subject}\n` +
|
`## Robot review attached to task #${task.id}: ${task.subject}\n` +
|
||||||
|
`Iteration: ${getRobotReviews(store.get(params.taskId)!).length}\n` +
|
||||||
`Reviewer: ${params.reviewer}\n` +
|
`Reviewer: ${params.reviewer}\n` +
|
||||||
`Scope: ${params.scope}\n\n` +
|
`Scope: ${params.scope}\n\n` +
|
||||||
|
`Evidence complete: ${params.evidence_complete ? "yes" : "no"}\n` +
|
||||||
|
`Evidence convincing: ${params.evidence_convincing ? "yes" : "no"}\n\n` +
|
||||||
`### Observations\n${params.observations.map(o => `- ${o}`).join("\n")}\n\n` +
|
`### Observations\n${params.observations.map(o => `- ${o}`).join("\n")}\n\n` +
|
||||||
|
`${(params.missing_evidence?.length ?? 0) > 0 ? `### Missing evidence\n${(params.missing_evidence ?? []).map(item => `- ${item}`).join("\n")}\n\n` : ""}` +
|
||||||
`### Blind spots\n${params.blind_spots}\n\n` +
|
`### Blind spots\n${params.blind_spots}\n\n` +
|
||||||
`${REVIEW_BADGES.robot} Robot review stored. Human sign-off still requires \`/lgtm ${task.id}\`.`;
|
`${REVIEW_BADGES.robot} Robot review stored. Human sign-off still requires \`/lgtm ${task.id}\`.`;
|
||||||
|
|
||||||
@@ -443,6 +620,46 @@ This does not complete the task. Human /lgtm remains the only completion path.`,
|
|||||||
},
|
},
|
||||||
});
|
});
|
||||||
|
|
||||||
|
pi.registerTool({
|
||||||
|
name: "robot_review_run",
|
||||||
|
label: "robot_review_run",
|
||||||
|
description: `Run the configured automatic robot reviewer against the current task evidence.
|
||||||
|
|
||||||
|
Uses PI_LGTM_ROBOT_REVIEW_CMD if set, otherwise defaults to:
|
||||||
|
\`acpx --approve-reads --non-interactive-permissions deny opencode exec\`
|
||||||
|
|
||||||
|
This appends a new robot-review iteration. If the reviewer marks evidence incomplete or unconvincing, pending human sign-off is cleared until stronger evidence is submitted and reviewed again.`,
|
||||||
|
parameters: Type.Object({
|
||||||
|
taskId: Type.String({ description: "Task ID to review" }),
|
||||||
|
}),
|
||||||
|
|
||||||
|
async execute(_toolCallId, params, signal, _onUpdate, _ctx) {
|
||||||
|
const task = store.get(params.taskId);
|
||||||
|
if (!task) return textResult(`Task #${params.taskId} not found`);
|
||||||
|
if (!task.metadata?.lgtm_evidence) {
|
||||||
|
return textResult(`Task #${params.taskId} has no stored evidence yet. Call lgtm_ask first.`);
|
||||||
|
}
|
||||||
|
|
||||||
|
const { review, command } = await runAutomaticRobotReview(task, signal);
|
||||||
|
store.update(params.taskId, {
|
||||||
|
pending_approval: review.evidence_complete && review.evidence_convincing ? task.pending_approval : false,
|
||||||
|
metadata: appendRobotReviewMetadata(task, review),
|
||||||
|
});
|
||||||
|
widget.update();
|
||||||
|
|
||||||
|
return textResult(
|
||||||
|
`## Automatic robot review for task #${task.id}: ${task.subject}\n` +
|
||||||
|
`Reviewer command: ${command}\n` +
|
||||||
|
`Iteration: ${getRobotReviews(store.get(params.taskId)!).length}\n` +
|
||||||
|
`Evidence complete: ${review.evidence_complete ? "yes" : "no"}\n` +
|
||||||
|
`Evidence convincing: ${review.evidence_convincing ? "yes" : "no"}\n\n` +
|
||||||
|
`### Observations\n${review.observations.map(o => `- ${o}`).join("\n")}\n\n` +
|
||||||
|
`${review.missing_evidence.length > 0 ? `### Missing evidence\n${review.missing_evidence.map(item => `- ${item}`).join("\n")}\n\n` : ""}` +
|
||||||
|
`### Blind spots\n${review.blind_spots}`,
|
||||||
|
);
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
// ──────────────────────────────────────────────────
|
// ──────────────────────────────────────────────────
|
||||||
// /tasks command
|
// /tasks command
|
||||||
// ──────────────────────────────────────────────────
|
// ──────────────────────────────────────────────────
|
||||||
@@ -531,12 +748,11 @@ This does not complete the task. Human /lgtm remains the only completion path.`,
|
|||||||
evidenceNote = parts.join("\n");
|
evidenceNote = parts.join("\n");
|
||||||
}
|
}
|
||||||
let robotNote = "";
|
let robotNote = "";
|
||||||
if (em.robot_review_observations?.length) {
|
const robotReviews = getRobotReviews(task);
|
||||||
const parts = [`\n\nRobot review (${em.robot_review_submitted_at ?? "?"})`];
|
if (robotReviews.length > 0) {
|
||||||
if (em.robot_review_reviewer) parts.push(`Reviewer: ${em.robot_review_reviewer}`);
|
const latest = robotReviews[robotReviews.length - 1];
|
||||||
if (em.robot_review_scope) parts.push(`Scope: ${em.robot_review_scope}`);
|
const parts = [`\n\nRobot reviews: ${robotReviews.length}`];
|
||||||
parts.push(`Observations:\n- ${em.robot_review_observations.join("\n- ")}`);
|
parts.push(formatRobotReview(latest));
|
||||||
if (em.robot_review_blind_spots) parts.push(`Blind spots: ${em.robot_review_blind_spots}`);
|
|
||||||
robotNote = parts.join("\n");
|
robotNote = parts.join("\n");
|
||||||
}
|
}
|
||||||
const title = `#${task.id} [${task.status}] ${task.subject}\nDone: ${task.done_criterion}${pendingNote}\n${task.description}${evidenceNote}${robotNote}`;
|
const title = `#${task.id} [${task.status}] ${task.subject}\nDone: ${task.done_criterion}${pendingNote}\n${task.description}${evidenceNote}${robotNote}`;
|
||||||
@@ -585,6 +801,10 @@ This does not complete the task. Human /lgtm remains the only completion path.`,
|
|||||||
ctx.ui.notify(`Task #${taskId} not ready. Agent must call lgtm_ask first.`, "error");
|
ctx.ui.notify(`Task #${taskId} not ready. Agent must call lgtm_ask first.`, "error");
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
if (getRobotReviews(task).length > 0 && !latestRobotReviewPasses(task)) {
|
||||||
|
ctx.ui.notify(`Task #${taskId} is blocked by the latest robot review. Strengthen evidence and rerun review first.`, "error");
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
|
||||||
// Show stored evidence for review before sign-off
|
// Show stored evidence for review before sign-off
|
||||||
const m = task.metadata;
|
const m = task.metadata;
|
||||||
@@ -598,15 +818,14 @@ This does not complete the task. Human /lgtm remains the only completion path.`,
|
|||||||
if (m.lgtm_verification_hints?.length) evidenceParts.push(`Hints: ${m.lgtm_verification_hints.join(", ")}`);
|
if (m.lgtm_verification_hints?.length) evidenceParts.push(`Hints: ${m.lgtm_verification_hints.join(", ")}`);
|
||||||
evidenceParts.push(`Submitted: ${m.lgtm_submitted_at}`);
|
evidenceParts.push(`Submitted: ${m.lgtm_submitted_at}`);
|
||||||
}
|
}
|
||||||
if (m.robot_review_observations?.length) {
|
const robotReviews = getRobotReviews(task);
|
||||||
const robotParts = [
|
if (robotReviews.length > 0) {
|
||||||
`Robot review:\nReviewer: ${m.robot_review_reviewer ?? "?"}`,
|
evidenceParts.push(
|
||||||
`Scope: ${m.robot_review_scope ?? "?"}`,
|
`Robot reviews (${robotReviews.length} total):\n${robotReviews.map(formatRobotReview).join("\n\n")}`,
|
||||||
`Observations:\n- ${m.robot_review_observations.join("\n- ")}`,
|
);
|
||||||
];
|
if (!latestRobotReviewPasses(task)) {
|
||||||
if (m.robot_review_blind_spots) robotParts.push(`Blind spots: ${m.robot_review_blind_spots}`);
|
evidenceParts.push("Latest robot review says the evidence is not yet complete/convincing.");
|
||||||
if (m.robot_review_submitted_at) robotParts.push(`Submitted: ${m.robot_review_submitted_at}`);
|
}
|
||||||
evidenceParts.push(robotParts.join("\n"));
|
|
||||||
}
|
}
|
||||||
const evidenceSummary = evidenceParts.length > 0 ? evidenceParts.join("\n\n") : "(no stored evidence)";
|
const evidenceSummary = evidenceParts.length > 0 ? evidenceParts.join("\n\n") : "(no stored evidence)";
|
||||||
const confirm = await ctx.ui.select(
|
const confirm = await ctx.ui.select(
|
||||||
|
|||||||
@@ -1,3 +1,4 @@
|
|||||||
|
import { getRobotReviews } from "./robot-review.js";
|
||||||
import type { Task } from "./types.js";
|
import type { Task } from "./types.js";
|
||||||
|
|
||||||
export const REVIEW_BADGES = {
|
export const REVIEW_BADGES = {
|
||||||
@@ -9,7 +10,7 @@ export const REVIEW_BADGES = {
|
|||||||
export function getReviewBadges(task: Task): string[] {
|
export function getReviewBadges(task: Task): string[] {
|
||||||
const badges: string[] = [];
|
const badges: string[] = [];
|
||||||
if (task.metadata?.lgtm_evidence) badges.push(REVIEW_BADGES.tool);
|
if (task.metadata?.lgtm_evidence) badges.push(REVIEW_BADGES.tool);
|
||||||
if (task.metadata?.robot_review_observations?.length) badges.push(REVIEW_BADGES.robot);
|
if (getRobotReviews(task).length > 0) badges.push(REVIEW_BADGES.robot);
|
||||||
if (task.pending_approval && task.status !== "completed") badges.push(REVIEW_BADGES.human);
|
if (task.pending_approval && task.status !== "completed") badges.push(REVIEW_BADGES.human);
|
||||||
return badges;
|
return badges;
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -0,0 +1,107 @@
|
|||||||
|
import type { Task } from "./types.js";
|
||||||
|
|
||||||
|
export type RobotReviewMode = "manual" | "auto";
|
||||||
|
|
||||||
|
export interface RobotReviewRecord {
|
||||||
|
iteration: number;
|
||||||
|
reviewer: string;
|
||||||
|
scope: string;
|
||||||
|
observations: string[];
|
||||||
|
blind_spots: string;
|
||||||
|
evidence_complete: boolean;
|
||||||
|
evidence_convincing: boolean;
|
||||||
|
missing_evidence: string[];
|
||||||
|
submitted_at: string;
|
||||||
|
mode: RobotReviewMode;
|
||||||
|
raw_output?: string;
|
||||||
|
}
|
||||||
|
|
||||||
|
function toStringArray(value: unknown): string[] {
|
||||||
|
return Array.isArray(value) ? value.filter((item): item is string => typeof item === "string") : [];
|
||||||
|
}
|
||||||
|
|
||||||
|
function normalizeReview(value: unknown, index: number): RobotReviewRecord | undefined {
|
||||||
|
if (!value || typeof value !== "object") return undefined;
|
||||||
|
const review = value as Record<string, unknown>;
|
||||||
|
const reviewer = typeof review.reviewer === "string" ? review.reviewer : "unknown";
|
||||||
|
const scope = typeof review.scope === "string" ? review.scope : "unknown";
|
||||||
|
const observations = toStringArray(review.observations);
|
||||||
|
if (observations.length === 0) return undefined;
|
||||||
|
return {
|
||||||
|
iteration: typeof review.iteration === "number" ? review.iteration : index + 1,
|
||||||
|
reviewer,
|
||||||
|
scope,
|
||||||
|
observations,
|
||||||
|
blind_spots: typeof review.blind_spots === "string" ? review.blind_spots : "not recorded",
|
||||||
|
evidence_complete: typeof review.evidence_complete === "boolean" ? review.evidence_complete : true,
|
||||||
|
evidence_convincing: typeof review.evidence_convincing === "boolean" ? review.evidence_convincing : true,
|
||||||
|
missing_evidence: toStringArray(review.missing_evidence),
|
||||||
|
submitted_at: typeof review.submitted_at === "string" ? review.submitted_at : new Date(0).toISOString(),
|
||||||
|
mode: review.mode === "auto" ? "auto" : "manual",
|
||||||
|
raw_output: typeof review.raw_output === "string" ? review.raw_output : undefined,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
function getLegacyRobotReview(task: Task): RobotReviewRecord | undefined {
|
||||||
|
const observations = toStringArray(task.metadata?.robot_review_observations);
|
||||||
|
if (observations.length === 0) return undefined;
|
||||||
|
return {
|
||||||
|
iteration: 1,
|
||||||
|
reviewer: typeof task.metadata?.robot_review_reviewer === "string" ? task.metadata.robot_review_reviewer : "unknown",
|
||||||
|
scope: typeof task.metadata?.robot_review_scope === "string" ? task.metadata.robot_review_scope : "unknown",
|
||||||
|
observations,
|
||||||
|
blind_spots: typeof task.metadata?.robot_review_blind_spots === "string" ? task.metadata.robot_review_blind_spots : "not recorded",
|
||||||
|
evidence_complete: typeof task.metadata?.robot_review_evidence_complete === "boolean" ? task.metadata.robot_review_evidence_complete : true,
|
||||||
|
evidence_convincing: typeof task.metadata?.robot_review_evidence_convincing === "boolean" ? task.metadata.robot_review_evidence_convincing : true,
|
||||||
|
missing_evidence: toStringArray(task.metadata?.robot_review_missing_evidence),
|
||||||
|
submitted_at: typeof task.metadata?.robot_review_submitted_at === "string" ? task.metadata.robot_review_submitted_at : new Date(0).toISOString(),
|
||||||
|
mode: task.metadata?.robot_review_mode === "auto" ? "auto" : "manual",
|
||||||
|
raw_output: typeof task.metadata?.robot_review_raw_output === "string" ? task.metadata.robot_review_raw_output : undefined,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
export function getRobotReviews(task: Task): RobotReviewRecord[] {
|
||||||
|
const reviews = Array.isArray(task.metadata?.robot_reviews)
|
||||||
|
? task.metadata.robot_reviews
|
||||||
|
.map((review: unknown, index: number) => normalizeReview(review, index))
|
||||||
|
.filter((review): review is RobotReviewRecord => review !== undefined)
|
||||||
|
: [];
|
||||||
|
if (reviews.length > 0) {
|
||||||
|
return reviews.map((review, index) => ({ ...review, iteration: index + 1 }));
|
||||||
|
}
|
||||||
|
const legacy = getLegacyRobotReview(task);
|
||||||
|
return legacy ? [legacy] : [];
|
||||||
|
}
|
||||||
|
|
||||||
|
export function getLatestRobotReview(task: Task): RobotReviewRecord | undefined {
|
||||||
|
const reviews = getRobotReviews(task);
|
||||||
|
return reviews.length > 0 ? reviews[reviews.length - 1] : undefined;
|
||||||
|
}
|
||||||
|
|
||||||
|
export function appendRobotReviewMetadata(task: Task, review: Omit<RobotReviewRecord, "iteration">): Record<string, unknown> {
|
||||||
|
const robot_reviews = [...getRobotReviews(task), { ...review, iteration: 0 }].map((entry, index) => ({
|
||||||
|
...entry,
|
||||||
|
iteration: index + 1,
|
||||||
|
}));
|
||||||
|
const latest = robot_reviews[robot_reviews.length - 1];
|
||||||
|
return {
|
||||||
|
robot_reviews,
|
||||||
|
robot_review_reviewer: latest.reviewer,
|
||||||
|
robot_review_scope: latest.scope,
|
||||||
|
robot_review_observations: latest.observations,
|
||||||
|
robot_review_blind_spots: latest.blind_spots,
|
||||||
|
robot_review_evidence_complete: latest.evidence_complete,
|
||||||
|
robot_review_evidence_convincing: latest.evidence_convincing,
|
||||||
|
robot_review_missing_evidence: latest.missing_evidence,
|
||||||
|
robot_review_submitted_at: latest.submitted_at,
|
||||||
|
robot_review_mode: latest.mode,
|
||||||
|
robot_review_raw_output: latest.raw_output ?? null,
|
||||||
|
robot_review_requires_followup: !(latest.evidence_complete && latest.evidence_convincing),
|
||||||
|
robot_review_iteration_count: robot_reviews.length,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
export function latestRobotReviewPasses(task: Task): boolean {
|
||||||
|
const latest = getLatestRobotReview(task);
|
||||||
|
return latest ? latest.evidence_complete && latest.evidence_convincing : false;
|
||||||
|
}
|
||||||
@@ -30,7 +30,18 @@ describe("getReviewBadges", () => {
|
|||||||
pending_approval: true,
|
pending_approval: true,
|
||||||
metadata: {
|
metadata: {
|
||||||
lgtm_evidence: "npm test",
|
lgtm_evidence: "npm test",
|
||||||
robot_review_observations: ["Observed one unchecked edge case"],
|
robot_reviews: [{
|
||||||
|
iteration: 1,
|
||||||
|
reviewer: "opencode",
|
||||||
|
scope: "task evidence",
|
||||||
|
observations: ["Observed one unchecked edge case"],
|
||||||
|
blind_spots: "Did not inspect prod traffic",
|
||||||
|
evidence_complete: false,
|
||||||
|
evidence_convincing: false,
|
||||||
|
missing_evidence: ["Prod traffic sample"],
|
||||||
|
submitted_at: "2026-04-17T00:00:00.000Z",
|
||||||
|
mode: "manual",
|
||||||
|
}],
|
||||||
},
|
},
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|||||||
@@ -0,0 +1,75 @@
|
|||||||
|
import { describe, expect, it } from "vitest";
|
||||||
|
import { appendRobotReviewMetadata, getLatestRobotReview, getRobotReviews } from "../src/robot-review.js";
|
||||||
|
import type { Task } from "../src/types.js";
|
||||||
|
|
||||||
|
function makeTask(overrides: Partial<Task> = {}): Task {
|
||||||
|
return {
|
||||||
|
id: "1",
|
||||||
|
subject: "Test",
|
||||||
|
description: "Desc",
|
||||||
|
done_criterion: "done",
|
||||||
|
pending_approval: false,
|
||||||
|
status: "pending",
|
||||||
|
progress_label: undefined,
|
||||||
|
metadata: {},
|
||||||
|
blocks: [],
|
||||||
|
blockedBy: [],
|
||||||
|
createdAt: 0,
|
||||||
|
updatedAt: 0,
|
||||||
|
...overrides,
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
describe("robot review helpers", () => {
|
||||||
|
it("reads legacy single-review metadata", () => {
|
||||||
|
const task = makeTask({
|
||||||
|
metadata: {
|
||||||
|
robot_review_reviewer: "opencode",
|
||||||
|
robot_review_scope: "task evidence",
|
||||||
|
robot_review_observations: ["Observed no command output for the core claim"],
|
||||||
|
robot_review_blind_spots: "Did not rerun tests",
|
||||||
|
robot_review_submitted_at: "2026-04-17T00:00:00.000Z",
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
const reviews = getRobotReviews(task);
|
||||||
|
expect(reviews).toHaveLength(1);
|
||||||
|
expect(reviews[0].reviewer).toBe("opencode");
|
||||||
|
expect(reviews[0].iteration).toBe(1);
|
||||||
|
});
|
||||||
|
|
||||||
|
it("appends robot reviews as iterations", () => {
|
||||||
|
const task = makeTask();
|
||||||
|
const metadata1 = appendRobotReviewMetadata(task, {
|
||||||
|
reviewer: "opencode",
|
||||||
|
scope: "task evidence",
|
||||||
|
observations: ["Observed missing benchmark output"],
|
||||||
|
blind_spots: "Did not inspect prod config",
|
||||||
|
evidence_complete: false,
|
||||||
|
evidence_convincing: false,
|
||||||
|
missing_evidence: ["Benchmark output for the claimed speedup"],
|
||||||
|
submitted_at: "2026-04-17T00:00:00.000Z",
|
||||||
|
mode: "auto",
|
||||||
|
});
|
||||||
|
const task1 = makeTask({ metadata: metadata1 });
|
||||||
|
const metadata2 = appendRobotReviewMetadata(task1, {
|
||||||
|
reviewer: "opencode",
|
||||||
|
scope: "updated task evidence",
|
||||||
|
observations: ["Observed benchmark output and test transcript"],
|
||||||
|
blind_spots: "Did not inspect long-run stability",
|
||||||
|
evidence_complete: true,
|
||||||
|
evidence_convincing: true,
|
||||||
|
missing_evidence: [],
|
||||||
|
submitted_at: "2026-04-17T01:00:00.000Z",
|
||||||
|
mode: "auto",
|
||||||
|
});
|
||||||
|
|
||||||
|
const task2 = makeTask({ metadata: metadata2 });
|
||||||
|
const reviews = getRobotReviews(task2);
|
||||||
|
expect(reviews).toHaveLength(2);
|
||||||
|
expect(reviews[0].iteration).toBe(1);
|
||||||
|
expect(reviews[1].iteration).toBe(2);
|
||||||
|
expect(getLatestRobotReview(task2)?.evidence_convincing).toBe(true);
|
||||||
|
expect(task2.metadata.robot_review_iteration_count).toBe(2);
|
||||||
|
});
|
||||||
|
});
|
||||||
Reference in New Issue
Block a user