fix lgtm review gating and evidence display

This commit is contained in:
wassname
2026-06-07 18:55:05 +08:00
parent dbe887f0c2
commit 023a2c2263
11 changed files with 394 additions and 92 deletions
+4 -4
View File
@@ -11,7 +11,7 @@ Tasks can also carry a separate fresh-perspective robot review from a subagent o
## Install
```bash
pi install npm:@wassname/pi-lgtm
pi install npm:@wassname2/pi-lgtm
```
Or for development:
@@ -65,7 +65,7 @@ Lists all tasks. `👀` indicates pending sign-off.
### `TaskGet`
Full task details including `done_criterion` and approval state.
Full task details including `done_criterion`, approval state, and a one-line gate status such as `ready for human sign-off via /lgtm 5` or `blocked: automatic robot review failed: ...`.
### `TaskUpdate`
@@ -109,7 +109,7 @@ Required fields:
| `evidence_convincing` | Whether the supplied evidence would convince a skeptical reviewer |
| `missing_evidence` | Concrete missing checks or artifacts needed before human sign-off |
Use this from a separate subagent or other model when possible. Reviews append as iterations; the latest one is what gates human sign-off.
Use this from a separate subagent or other model when possible. Reviews append as iterations; the latest one is what gates human sign-off. If stored LGTM evidence already exists, an accepted manual review reopens the human sign-off gate.
### `robot_review_run`
@@ -133,7 +133,7 @@ This appends a new robot-review iteration. The reviewer returns an explicit `acc
### `/lgtm <id>`
Human-only sign-off. Shows stored evidence, failure modes, and remaining uncertainty for review, then asks for confirmation. Without `<id>`, shows a list of pending-approval tasks.
Human-only sign-off. Shows stored evidence, falsification output, failure modes, review status, and remaining uncertainty in structured sections for review, then asks for confirmation. Without `<id>`, shows a list of pending-approval tasks.
### `/tasks`
@@ -0,0 +1,57 @@
# LGTM friction fixes
## Goal
Fix the subset of reported LGTM-tool friction that seems clearly worth it now.
Specifically: make auto-review parse failures less opaque, let an accepted manual robot review reopen the human gate, show a compact current gate status, and make stored evidence easier to read during `/lgtm` review.
## Scope
In: parser/error handling for automatic robot review, manual review gate semantics, gate-status rendering, `/lgtm` evidence formatting, focused tests, README updates.
Out: evidence path attachments, supersede operations, schema redesign for commands/log artifacts, new completion-mode field, full state-machine overhaul.
## Requirements
- R1: Automatic robot review failures are diagnosable. Done means: parse errors include raw-output context and tolerate common wrapper noise around the JSON object. VERIFY: `npm test -- test/robot-review-runner.test.ts` shows parser fallback tests passing. If silent failure remained, the new malformed-output tests would still throw opaque errors.
- R2: Accepted manual robot review can reopen human sign-off after a failed auto review. Done means: storing an accepted manual review can set `pending_approval=true` when LGTM evidence exists. VERIFY: focused test covers this transition. If broken, the task would stay blocked after accepted manual review.
- R3: Task details expose current gate status in one short line. Done means: a helper summarizes states like ready, blocked on rejected review, or blocked on reviewer failure. VERIFY: focused tests assert representative summaries. If broken, TaskGet output would still require reading raw metadata to infer gate state.
- R4: `/lgtm` review output is easier to scan. Done means: evidence and falsification output are rendered in explicit markdown sections/code fences instead of one flat blob. VERIFY: source shows sectioned formatter used by `/lgtm`. If broken, the old unstructured evidence block remains.
## Tasks
- [x] T1 (R1): Harden robot-review JSON extraction and error messages.
- steps: add parser helpers; tolerate fenced/extra prose around a JSON object inside markers; include output snippet on failure
- verify: `npm test -- test/robot-review-runner.test.ts`
- success: parser tests pass, including noisy-marker cases
- likely_fail: fallback never runs because markers are missing or parsing still uses raw `JSON.parse`
- sneaky_fail: fallback grabs the wrong braces and accepts junk; targeted tests catch nested/noisy cases
- UAT: "when robot review returns markers plus noise, I observe it still parses or at least shows the raw failure context"
- [x] T2 (R2,R3,R4): Fix gate semantics and human-facing formatting.
- steps: add gate-status helper; use it in TaskGet and lgtm_ask result; let accepted manual review set pending approval when evidence exists; format `/lgtm` evidence sections
- verify: `npm test -- test/robot-review.test.ts test/task-store.test.ts`
- success: manual-review gating tests pass and task summaries expose gate status
- likely_fail: accepted manual review still preserves old false gate state
- sneaky_fail: gate summary says ready while latest review actually failed; summary tests catch mismatches
- UAT: "when I inspect a task after review, I observe one clear gate-status line and readable evidence sections"
- [x] T3 (R1-R4): Update README for new semantics.
- steps: mention accepted manual review reopening the gate, clearer TaskGet state, and structured `/lgtm` evidence display
- verify: `rg -n "manual review|gate status|/lgtm" README.md`
- success: README mentions the changed behavior
- likely_fail: docs still imply only auto review can reopen approval
- sneaky_fail: docs mention review generally but omit the new gate semantics
- UAT: "when I read README, I can infer the new gate behavior without reading code"
## Context
- Keep fail-fast simplicity. Do not add a large state-machine rewrite.
- Preserve the human `/lgtm` override path.
- Manual review should not complete tasks, only reopen the human gate when it accepts existing evidence.
## Log
- 2026-06-07: User explicitly said not to fix everything, only the subset that makes sense. Chosen fixes are 1, 2, 10, plus `/lgtm` evidence readability because it is low-cost and directly reported.
- 2026-06-07: Implemented best-effort JSON extraction from noisy marker blocks, persisted automatic-review failure context in task metadata, reopened `pending_approval` when any accepted review validates stored evidence, added `Gate status:` summaries, and reformatted human review output into explicit sections/code fences.
- 2026-06-07: Verification passed with `npm test`, `npm run typecheck`, and `npm run lint`.
## TODO
- Evidence-path attachments may be worth adding later if pasted blobs remain too awkward.
- Superseded-evidence support probably wants explicit iteration objects, not another metadata flag.
## Errors
| Task | Error | Resolution |
|------|-------|------------|
+1 -1
View File
@@ -1,5 +1,5 @@
{
"name": "@wassname/pi-lgtm",
"name": "@wassname2/pi-lgtm",
"version": "0.4.2",
"description": "A pi extension providing goal tracking with structural sign-off and LGTM workflow.",
"author": "wassname",
+202 -44
View File
@@ -27,16 +27,18 @@ import { join, resolve } from "node:path";
import type { ExtensionAPI, ExtensionCommandContext, ExtensionContext } from "@mariozechner/pi-coding-agent";
import { Type } from "@sinclair/typebox";
import { AutoClearManager } from "./auto-clear.js";
import { type DisplayStatus, getDisplayStatus, getReviewBadges, getStateTag } from "./review-badges.js";
import { type DisplayStatus, getDisplayStatus, getGateStatus, getReviewBadges, getStateTag } from "./review-badges.js";
import {
appendRobotReviewMetadata,
getLatestRobotReview,
getRobotReviews,
latestRobotReviewPasses,
type RobotReviewRecord,
shouldOpenHumanSignoffGate,
} from "./robot-review.js";
import { TaskStore } from "./task-store.js";
import { loadTasksConfig } from "./tasks-config.js";
import type { Task } from "./types.js";
import { TaskWidget, type UICtx } from "./ui/task-widget.js";
function textResult(msg: string) {
@@ -170,10 +172,130 @@ export async function runRobotReviewCommand(
});
}
function extractRobotReviewJson(output: string): Record<string, unknown> {
function summarizeRawOutput(output: string, maxChars = 400): string {
const singleLine = output.replace(/\s+/g, " ").trim();
if (singleLine.length <= maxChars) return singleLine;
return `${singleLine.slice(0, maxChars)}...`;
}
function stripMarkdownCodeFence(text: string): string {
const trimmed = text.trim();
const fence = trimmed.match(/^```(?:json)?\s*([\s\S]*?)\s*```$/i);
return fence ? fence[1].trim() : trimmed;
}
function extractBalancedJsonObject(text: string): string | undefined {
let start = -1;
let depth = 0;
let inString = false;
let escaped = false;
for (let index = 0; index < text.length; index++) {
const char = text[index];
if (escaped) {
escaped = false;
continue;
}
if (char === "\\") {
escaped = true;
continue;
}
if (char === '"') {
inString = !inString;
continue;
}
if (inString) continue;
if (char === "{") {
if (depth === 0) start = index;
depth++;
continue;
}
if (char === "}") {
if (depth === 0) continue;
depth--;
if (depth === 0 && start >= 0) return text.slice(start, index + 1);
}
}
return undefined;
}
function getAutomaticReviewFailureMetadata(message: string, rawOutput?: string): Record<string, unknown> {
return {
robot_review_last_error: message,
robot_review_last_error_output: rawOutput ?? null,
robot_review_last_error_at: new Date().toISOString(),
};
}
function clearAutomaticReviewFailureMetadata(): Record<string, unknown> {
return {
robot_review_last_error: null,
robot_review_last_error_output: null,
robot_review_last_error_at: null,
};
}
function formatReviewTextBlock(title: string, body: string): string {
return `### ${title}\n\n\`\`\`text\n${body}\n\`\`\``;
}
function formatEvidencePackage(task: Task): string[] {
const metadata = task.metadata ?? {};
const sections: string[] = [];
if (typeof metadata.lgtm_evidence === "string") {
sections.push(formatReviewTextBlock("Evidence", metadata.lgtm_evidence));
if (typeof metadata.lgtm_failure_likely === "string") sections.push(`### Failure (likely)\n${metadata.lgtm_failure_likely}`);
if (typeof metadata.lgtm_failure_sneaky === "string") sections.push(`### Failure (sneaky)\n${metadata.lgtm_failure_sneaky}`);
if (typeof metadata.lgtm_falsification_test === "string") {
sections.push(formatReviewTextBlock("Falsification test", metadata.lgtm_falsification_test));
}
if (Array.isArray(metadata.lgtm_verification_hints) && metadata.lgtm_verification_hints.length > 0) {
sections.push(`### Verification hints\n${metadata.lgtm_verification_hints.map((hint: string) => `- ${hint}`).join("\n")}`);
}
if (typeof metadata.lgtm_remaining_uncertainty === "string") {
sections.push(`### Remaining uncertainty\n${metadata.lgtm_remaining_uncertainty}`);
}
if (typeof metadata.lgtm_submitted_at === "string") sections.push(`Submitted: ${metadata.lgtm_submitted_at}`);
}
if (typeof metadata.robot_review_last_error === "string") {
sections.push(`### Automatic robot review failure\n${metadata.robot_review_last_error}`);
if (typeof metadata.robot_review_last_error_output === "string" && metadata.robot_review_last_error_output.trim()) {
sections.push(formatReviewTextBlock("Reviewer raw output", metadata.robot_review_last_error_output));
}
}
return sections;
}
function getNonReviewMetadata(task: Task): Record<string, unknown> {
return Object.fromEntries(
Object.entries(task.metadata ?? {}).filter(([key]) => !key.startsWith("lgtm_") && !key.startsWith("robot_review_")),
);
}
export function extractRobotReviewJson(output: string): Record<string, unknown> {
const match = output.match(/ROBOT_REVIEW_JSON_START\s*([\s\S]*?)\s*ROBOT_REVIEW_JSON_END/);
if (!match) throw new Error("Robot reviewer did not return the expected JSON markers.");
return JSON.parse(match[1]) as Record<string, unknown>;
const source = match ? match[1] : output;
const candidates = [
source.trim(),
stripMarkdownCodeFence(source),
extractBalancedJsonObject(source) ?? "",
extractBalancedJsonObject(stripMarkdownCodeFence(source)) ?? "",
].filter(Boolean);
let lastError: unknown;
for (const candidate of [...new Set(candidates)]) {
try {
return JSON.parse(candidate) as Record<string, unknown>;
} catch (error) {
lastError = error;
}
}
const prefix = match
? "Robot reviewer returned invalid JSON"
: "Robot reviewer did not return the expected JSON markers or a parseable JSON object";
const detail = lastError instanceof Error ? `: ${lastError.message}` : "";
throw new Error(`${prefix}${detail}. Raw output: ${summarizeRawOutput(output)}`);
}
function formatRobotReview(review: RobotReviewRecord): string {
@@ -265,11 +387,24 @@ async function runAutomaticRobotReview(
const commandLabel = `${invocation.command} ${invocation.args.slice(0, reviewerModel ? 6 : 4).join(" ")}`;
const result = await runRobotReviewCommand(invocation, signal, timeoutMs);
if (result.exitCode !== 0) {
throw new Error(`Robot reviewer failed (${result.exitCode ?? "?"}): ${(result.stderr || result.stdout).trim()}`);
const error = new Error(`Robot reviewer failed (${result.exitCode ?? "?"}): ${(result.stderr || result.stdout).trim()}`) as Error & { rawOutput?: string };
error.rawOutput = (result.stderr || result.stdout).trim();
throw error;
}
let parsed: Record<string, unknown>;
try {
parsed = extractRobotReviewJson(result.stdout);
} catch (error) {
const wrapped = new Error(error instanceof Error ? error.message : String(error)) as Error & { rawOutput?: string };
wrapped.rawOutput = result.stdout.trim();
throw wrapped;
}
const parsed = extractRobotReviewJson(result.stdout);
const observations = Array.isArray(parsed.observations) ? parsed.observations.filter((item): item is string => typeof item === "string") : [];
if (observations.length === 0) throw new Error("Robot reviewer returned no observations.");
if (observations.length === 0) {
const error = new Error("Robot reviewer returned no observations.") as Error & { rawOutput?: string };
error.rawOutput = result.stdout.trim();
throw error;
}
const rawMissing: string[] = Array.isArray(parsed.missing_evidence)
? parsed.missing_evidence.filter((item): item is string => typeof item === "string")
: [];
@@ -548,13 +683,16 @@ export default function (pi: ExtensionAPI) {
const lines: string[] = [
`Task #${task.id}: ${task.subject}`,
`Status: ${task.status} ${getReviewBadges(task)}${task.pending_approval && task.status !== "completed" ? " (pending human sign-off)" : ""}`,
`Gate status: ${getGateStatus(task)}`,
`Done criterion: ${task.done_criterion}`,
`Description: ${desc}`,
];
lines.push(`Description: ${desc}`);
if (robotReviews.length > 0) {
const latest = robotReviews[robotReviews.length - 1];
lines.push(`Robot reviews: ${robotReviews.length} (latest: complete=${latest.evidence_complete ? "yes" : "no"}, convincing=${latest.evidence_convincing ? "yes" : "no"})`);
lines.push(`Robot reviews: ${robotReviews.length} (latest: accepted=${latest.accepted ? "yes" : "no"}, complete=${latest.evidence_complete ? "yes" : "no"}, convincing=${latest.evidence_convincing ? "yes" : "no"})`);
}
const evidenceSections = formatEvidencePackage(task);
if (evidenceSections.length > 0) lines.push(...evidenceSections);
if (task.blockedBy.length > 0) {
const openBlockers = task.blockedBy.filter(bid => {
const blocker = store.get(bid);
@@ -563,10 +701,10 @@ export default function (pi: ExtensionAPI) {
if (openBlockers.length > 0) lines.push(`Blocked by: ${openBlockers.map(id => "#" + id).join(", ")}`);
}
if (task.blocks.length > 0) lines.push(`Blocks: ${task.blocks.map(id => "#" + id).join(", ")}`);
const metaKeys = Object.keys(task.metadata);
if (metaKeys.length > 0) lines.push(`Metadata: ${JSON.stringify(task.metadata)}`);
const metadata = getNonReviewMetadata(task);
if (Object.keys(metadata).length > 0) lines.push(`Metadata: ${JSON.stringify(metadata)}`);
return Promise.resolve(textResult(lines.join("\n")));
return Promise.resolve(textResult(lines.join("\n\n")));
},
});
@@ -684,6 +822,7 @@ Do NOT summarize or interpret. Paste literal command output, exact log lines, ma
lgtm_verification_hints: params.verification_hints,
lgtm_remaining_uncertainty: params.remaining_uncertainty,
lgtm_submitted_at: new Date().toISOString(),
...clearAutomaticReviewFailureMetadata(),
},
});
let robotReviewNote = "";
@@ -692,8 +831,11 @@ Do NOT summarize or interpret. Paste literal command output, exact log lines, ma
try {
const { review, command } = await runAutomaticRobotReview(refreshedTask, signal, currentProvider);
store.update(params.taskId, {
pending_approval: review.accepted,
metadata: appendRobotReviewMetadata(refreshedTask, review),
pending_approval: shouldOpenHumanSignoffGate(refreshedTask, review.accepted),
metadata: {
...appendRobotReviewMetadata(refreshedTask, review),
...clearAutomaticReviewFailureMetadata(),
},
});
robotReviewNote =
`\n\n### Automatic robot review\n` +
@@ -712,33 +854,28 @@ Do NOT summarize or interpret. Paste literal command output, exact log lines, ma
robotReviewNote += `\nResult: human sign-off has been held back until the evidence is strengthened and reviewed again.`;
}
} catch (err: any) {
store.update(params.taskId, { pending_approval: false });
store.update(params.taskId, {
pending_approval: false,
metadata: getAutomaticReviewFailureMetadata(err.message, err.rawOutput),
});
robotReviewNote =
`\n\n### Automatic robot review\n` +
`Reviewer failed: ${err.message}\n` +
`Human sign-off is blocked until the reviewer stage succeeds.`;
`Human sign-off is blocked until the reviewer stage succeeds.` +
(typeof err.rawOutput === "string" && err.rawOutput.trim()
? `\n\n${formatReviewTextBlock("Reviewer raw output", err.rawOutput.trim())}`
: "");
}
widget.update();
const hintsSection = params.verification_hints?.length
? `\n### Verification hints\n${params.verification_hints.map(h => `- ${h}`).join("\n")}`
: "";
const uncertaintySection = params.remaining_uncertainty
? `\n### Remaining uncertainty\n${params.remaining_uncertainty}`
: "";
const updatedTask = store.get(task.id) ?? task;
const result =
`## Task #${task.id}: ${task.subject}\n` +
`Done criterion: ${task.done_criterion}\n\n` +
`### Evidence\n${params.evidence}\n\n` +
`### Failure (likely)\n${params.failure_likely}\n\n` +
`### Failure (sneaky)\n${params.failure_sneaky}\n\n` +
`### Falsification test\n${params.falsification_test}` +
hintsSection +
uncertaintySection +
`${formatEvidencePackage(updatedTask).join("\n\n")}` +
robotReviewNote +
`\n\n---\n` +
`Task #${task.id} is now ${store.get(task.id)?.pending_approval ? `pending human sign-off via \`/lgtm ${task.id}\`` : "not yet ready for human sign-off"}.\n\n` +
`Gate status: ${getGateStatus(updatedTask)}\n\n` +
`**Self-check (non-blocking):** Look at this as the human will see it. ` +
`Does the evidence directly address the done_criterion "${task.done_criterion}"? ` +
`Would a skeptical reviewer find this convincing, or would they immediately ask ` +
@@ -778,21 +915,23 @@ This does not complete the task. Human /lgtm remains the only completion path.`,
if (!task) return Promise.resolve(textResult(`Task #${params.taskId} not found`));
if (task.status === "completed") return Promise.resolve(textResult(`Task #${params.taskId} already completed`));
const accepted = params.accepted ?? (params.evidence_complete && params.evidence_convincing);
store.update(params.taskId, {
pending_approval: params.evidence_complete && params.evidence_convincing ? task.pending_approval : false,
pending_approval: shouldOpenHumanSignoffGate(task, accepted),
metadata: {
...appendRobotReviewMetadata(task, {
reviewer: params.reviewer,
scope: params.scope,
observations: params.observations,
blind_spots: params.blind_spots,
accepted: params.accepted ?? (params.evidence_complete && params.evidence_convincing),
accepted,
evidence_complete: params.evidence_complete,
evidence_convincing: params.evidence_convincing,
missing_evidence: params.missing_evidence ?? [],
submitted_at: new Date().toISOString(),
mode: "manual",
}),
...clearAutomaticReviewFailureMetadata(),
},
});
widget.update();
@@ -802,12 +941,13 @@ This does not complete the task. Human /lgtm remains the only completion path.`,
`Iteration: ${getRobotReviews(store.get(params.taskId)!).length}\n` +
`Reviewer: ${params.reviewer}\n` +
`Scope: ${params.scope}\n\n` +
`Accepted: ${(params.accepted ?? (params.evidence_complete && params.evidence_convincing)) ? "yes" : "no"}\n` +
`Accepted: ${accepted ? "yes" : "no"}\n` +
`Evidence complete: ${params.evidence_complete ? "yes" : "no"}\n` +
`Evidence convincing: ${params.evidence_convincing ? "yes" : "no"}\n\n` +
`### Observations\n${params.observations.map(o => `- ${o}`).join("\n")}\n\n` +
`${(params.missing_evidence?.length ?? 0) > 0 ? `### Missing evidence\n${(params.missing_evidence ?? []).map(item => `- ${item}`).join("\n")}\n\n` : ""}` +
`### Blind spots\n${params.blind_spots}\n\n` +
`Gate status: ${getGateStatus(store.get(params.taskId) ?? task)}\n\n` +
`🤖 Robot review stored. Human sign-off still requires \`/lgtm ${task.id}\`.`;
return Promise.resolve(textResult(result));
@@ -833,10 +973,14 @@ This appends a new robot-review iteration. If the reviewer marks evidence incomp
return textResult(`Task #${params.taskId} has no stored evidence yet. Call lgtm_ask first.`);
}
try {
const { review, command } = await runAutomaticRobotReview(task, signal, currentProvider);
store.update(params.taskId, {
pending_approval: review.accepted ? task.pending_approval : false,
metadata: appendRobotReviewMetadata(task, review),
pending_approval: shouldOpenHumanSignoffGate(task, review.accepted),
metadata: {
...appendRobotReviewMetadata(task, review),
...clearAutomaticReviewFailureMetadata(),
},
});
widget.update();
@@ -852,8 +996,24 @@ This appends a new robot-review iteration. If the reviewer marks evidence incomp
: "") +
`### Observations\n${review.observations.map(o => `- ${o}`).join("\n")}\n\n` +
`${review.missing_evidence.length > 0 ? `### Missing evidence\n${review.missing_evidence.map(item => `- ${item}`).join("\n")}\n\n` : ""}` +
`### Blind spots\n${review.blind_spots}`,
`### Blind spots\n${review.blind_spots}\n\n` +
`Gate status: ${getGateStatus(store.get(params.taskId) ?? task)}`,
);
} catch (err: any) {
store.update(params.taskId, {
pending_approval: false,
metadata: getAutomaticReviewFailureMetadata(err.message, err.rawOutput),
});
widget.update();
return textResult(
`## Automatic robot review for task #${task.id}: ${task.subject}\n` +
`Reviewer failed: ${err.message}\n\n` +
`Gate status: ${getGateStatus(store.get(params.taskId) ?? task)}` +
(typeof err.rawOutput === "string" && err.rawOutput.trim()
? `\n\n${formatReviewTextBlock("Reviewer raw output", err.rawOutput.trim())}`
: ""),
);
}
},
});
@@ -999,17 +1159,12 @@ This appends a new robot-review iteration. If the reviewer marks evidence incomp
const robotReviews = getRobotReviews(task);
const noEvidence = !task.pending_approval && !m.lgtm_evidence;
const robotRejected = robotReviews.length > 0 && !latestRobotReviewPasses(task);
const reviewerFailed = typeof m.robot_review_last_error === "string";
// Print evidence to the conversation so the user can review it there
const evidenceParts: string[] = [];
const evidenceParts: string[] = [`Gate status: ${getGateStatus(task)}`];
if (m.lgtm_evidence) {
evidenceParts.push(`**Evidence:**\n${m.lgtm_evidence}`);
evidenceParts.push(`Failure (likely): ${m.lgtm_failure_likely}`);
evidenceParts.push(`Failure (sneaky): ${m.lgtm_failure_sneaky}`);
if (m.lgtm_falsification_test) evidenceParts.push(`Falsification test: ${m.lgtm_falsification_test}`);
if (m.lgtm_remaining_uncertainty) evidenceParts.push(`Remaining uncertainty: ${m.lgtm_remaining_uncertainty}`);
if (m.lgtm_verification_hints?.length) evidenceParts.push(`Hints: ${m.lgtm_verification_hints.join(", ")}`);
evidenceParts.push(`Submitted: ${m.lgtm_submitted_at}`);
evidenceParts.push(...formatEvidencePackage(task));
} else {
evidenceParts.push(`(No agent-submitted evidence — agent never called lgtm_ask. Human override.)`);
}
@@ -1030,6 +1185,9 @@ This appends a new robot-review iteration. If the reviewer marks evidence incomp
if (noEvidence) {
title = `⚠ Task #${taskId} has no agent-submitted evidence.\nSign off anyway?\nDone: ${task.done_criterion}`;
signLabel = "✓ Override — sign off without evidence";
} else if (reviewerFailed) {
title = `⚠ Task #${taskId} automatic robot review failed.\nSign off anyway?\nDone: ${task.done_criterion}`;
signLabel = "✓ Override — sign off despite reviewer failure";
} else if (robotRejected) {
title = `⚠ Task #${taskId} robot review rejected the evidence.\nSign off anyway?\nDone: ${task.done_criterion}`;
signLabel = "✓ Override — sign off despite rejected review";
+21 -1
View File
@@ -1,4 +1,4 @@
import { getRobotReviews } from "./robot-review.js";
import { getLatestRobotReview, getRobotReviews } from "./robot-review.js";
import type { Task } from "./types.js";
const STAGES = ["🛠", "🤖", "👀"] as const;
@@ -14,6 +14,13 @@ export function getReviewBadges(task: Task): string {
return `[${slots.join("")}]`;
}
export const REVIEW_BADGES = {
evidence: STAGES[0],
robot: STAGES[1],
human: STAGES[2],
pipeline: STAGES,
};
export type DisplayStatus = "awaiting_signoff" | "in_progress" | "pending" | "completed";
/** Derived display bucket. `awaiting_signoff` is pending_approval && !completed. */
@@ -25,6 +32,19 @@ export function getDisplayStatus(task: Task): DisplayStatus {
export type StateTag = "READY" | "ACTIVE" | "PENDING" | "DONE";
export function getGateStatus(task: Task): string {
if (task.status === "completed") return "human signed off";
if (!task.metadata?.lgtm_evidence) return "no lgtm evidence submitted";
if (task.pending_approval) return `ready for human sign-off via /lgtm ${task.id}`;
if (typeof task.metadata?.robot_review_last_error === "string") {
return `blocked: automatic robot review failed: ${task.metadata.robot_review_last_error}`;
}
const latest = getLatestRobotReview(task);
if (latest && !latest.accepted) return "blocked: latest robot review rejected the evidence";
if (latest?.accepted) return "accepted robot review present, but the human gate is still closed";
return "blocked: evidence submitted, robot review still required";
}
/** Short uppercase tag for the human ("can I /lgtm this?" at a glance). */
export function getStateTag(task: Task): StateTag {
const s = getDisplayStatus(task);
+4
View File
@@ -101,6 +101,10 @@ export function getLatestRobotReview(task: Task): RobotReviewRecord | undefined
return reviews.length > 0 ? reviews[reviews.length - 1] : undefined;
}
export function shouldOpenHumanSignoffGate(task: Task, reviewAccepted: boolean): boolean {
return reviewAccepted && typeof task.metadata?.lgtm_evidence === "string" && task.metadata.lgtm_evidence.length > 0;
}
export function appendRobotReviewMetadata(task: Task, review: Omit<RobotReviewRecord, "iteration">): Record<string, unknown> {
const robot_reviews = [...getRobotReviews(task), { ...review, iteration: 0 }].map((entry, index) => ({
...entry,
+40 -1
View File
@@ -1,5 +1,5 @@
import { describe, expect, it } from "vitest";
import { getDisplayStatus, getReviewBadges } from "../src/review-badges.js";
import { getDisplayStatus, getGateStatus, getReviewBadges } from "../src/review-badges.js";
import type { Task } from "../src/types.js";
function makeTask(overrides: Partial<Task> = {}): Task {
@@ -60,6 +60,45 @@ describe("getReviewBadges", () => {
});
});
describe("getGateStatus", () => {
it("reports ready when human sign-off is open", () => {
expect(getGateStatus(makeTask({
pending_approval: true,
metadata: { lgtm_evidence: "ok" },
}))).toBe("ready for human sign-off via /lgtm 1");
});
it("reports reviewer failure separately from rejected evidence", () => {
expect(getGateStatus(makeTask({
metadata: {
lgtm_evidence: "ok",
robot_review_last_error: "Unexpected token 'a'",
},
}))).toContain("automatic robot review failed");
});
it("reports rejected robot review when latest review does not accept", () => {
expect(getGateStatus(makeTask({
metadata: {
lgtm_evidence: "ok",
robot_reviews: [{
iteration: 1,
reviewer: "opencode",
scope: "task evidence",
observations: ["Observed missing output"],
blind_spots: "none",
accepted: false,
evidence_complete: false,
evidence_convincing: false,
missing_evidence: ["literal output"],
submitted_at: "2026-04-17T00:00:00.000Z",
mode: "manual",
}],
},
}))).toBe("blocked: latest robot review rejected the evidence");
});
});
describe("getDisplayStatus", () => {
it("returns pending for fresh tasks", () => {
expect(getDisplayStatus(makeTask())).toBe("pending");
+17
View File
@@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest";
import {
DEFAULT_ROBOT_REVIEW_TIMEOUT_MS,
extractFinalAssistantTextFromPiJsonl,
extractRobotReviewJson,
getPiInvocation,
getRobotReviewTimeoutMs,
runRobotReviewCommand,
@@ -27,6 +28,22 @@ describe("robot review runner helpers", () => {
expect(extractFinalAssistantTextFromPiJsonl(output)).toContain("ROBOT_REVIEW_JSON_START");
});
it("parses noisy JSON wrapped in review markers", () => {
const output = [
"ROBOT_REVIEW_JSON_START",
"and here is the JSON you asked for:",
"```json",
'{"accepted":true,"observations":["ok"]}',
"```",
"ROBOT_REVIEW_JSON_END",
].join("\n");
expect(extractRobotReviewJson(output)).toEqual({ accepted: true, observations: ["ok"] });
});
it("includes raw output context on parse failure", () => {
expect(() => extractRobotReviewJson("ROBOT_REVIEW_JSON_START and nope ROBOT_REVIEW_JSON_END")).toThrow(/Raw output:/);
});
it("uses configured timeout or falls back to default", () => {
expect(getRobotReviewTimeoutMs({ PI_LGTM_ROBOT_REVIEW_TIMEOUT_MS: "2500" } as NodeJS.ProcessEnv)).toBe(2500);
expect(getRobotReviewTimeoutMs({ PI_LGTM_ROBOT_REVIEW_TIMEOUT_MS: "bad" } as NodeJS.ProcessEnv)).toBe(DEFAULT_ROBOT_REVIEW_TIMEOUT_MS);
+7 -1
View File
@@ -1,5 +1,5 @@
import { describe, expect, it } from "vitest";
import { appendRobotReviewMetadata, getLatestRobotReview, getRobotReviews } from "../src/robot-review.js";
import { appendRobotReviewMetadata, getLatestRobotReview, getRobotReviews, shouldOpenHumanSignoffGate } from "../src/robot-review.js";
import type { Task } from "../src/types.js";
function makeTask(overrides: Partial<Task> = {}): Task {
@@ -21,6 +21,12 @@ function makeTask(overrides: Partial<Task> = {}): Task {
}
describe("robot review helpers", () => {
it("reopens the human gate when accepted review exists for stored evidence", () => {
expect(shouldOpenHumanSignoffGate(makeTask({ metadata: { lgtm_evidence: "literal output" } }), true)).toBe(true);
expect(shouldOpenHumanSignoffGate(makeTask({ metadata: { lgtm_evidence: "literal output" } }), false)).toBe(false);
expect(shouldOpenHumanSignoffGate(makeTask(), true)).toBe(false);
});
it("reads legacy single-review metadata", () => {
const task = makeTask({
metadata: {
+25 -26
View File
@@ -34,7 +34,7 @@ describe("TaskStore (in-memory)", () => {
it("creates tasks with optional fields", () => {
const t = store.create("Task", "Desc", "done criterion", "Running task", { key: "value" });
expect(t.activeForm).toBe("Running task");
expect(t.progress_label).toBe("Running task");
expect(t.metadata).toEqual({ key: "value" });
});
@@ -72,16 +72,16 @@ describe("TaskStore (in-memory)", () => {
const { changedFields } = store.update("1", {
subject: "Updated subject",
description: "Updated desc",
owner: "agent-1",
metadata: { owner: "agent-1" },
});
expect(changedFields).toContain("subject");
expect(changedFields).toContain("description");
expect(changedFields).toContain("owner");
expect(changedFields).toContain("metadata");
const task = store.get("1")!;
expect(task.subject).toBe("Updated subject");
expect(task.owner).toBe("agent-1");
expect(task.metadata.owner).toBe("agent-1");
});
it("deletes a task with status: deleted", () => {
@@ -114,7 +114,7 @@ describe("TaskStore (in-memory)", () => {
store.create("Blocker", "Desc", "done");
store.create("Blocked", "Desc", "done");
store.update("1", { addBlocks: ["2"] });
store.update("1", { add_blocks: ["2"] });
const t1 = store.get("1")!;
const t2 = store.get("2")!;
@@ -126,7 +126,7 @@ describe("TaskStore (in-memory)", () => {
store.create("Blocker", "Desc", "done");
store.create("Blocked", "Desc", "done");
store.update("2", { addBlockedBy: ["1"] });
store.update("2", { add_blocked_by: ["1"] });
const t1 = store.get("1")!;
const t2 = store.get("2")!;
@@ -138,8 +138,8 @@ describe("TaskStore (in-memory)", () => {
store.create("A", "Desc", "done");
store.create("B", "Desc", "done");
store.update("1", { addBlocks: ["2"] });
store.update("1", { addBlocks: ["2"] }); // duplicate
store.update("1", { add_blocks: ["2"] });
store.update("1", { add_blocks: ["2"] }); // duplicate
const t1 = store.get("1")!;
expect(t1.blocks.filter(id => id === "2")).toHaveLength(1);
@@ -148,7 +148,7 @@ describe("TaskStore (in-memory)", () => {
it("cleans up dependency edges on deletion", () => {
store.create("A", "Desc", "done");
store.create("B", "Desc", "done");
store.update("1", { addBlocks: ["2"] });
store.update("1", { add_blocks: ["2"] });
store.update("1", { status: "deleted" });
@@ -230,8 +230,8 @@ describe("TaskStore (in-memory)", () => {
it("allows circular dependencies with warning", () => {
store.create("A", "Desc", "done");
store.create("B", "Desc", "done");
store.update("1", { addBlocks: ["2"] });
const { warnings } = store.update("2", { addBlocks: ["1"] });
store.update("1", { add_blocks: ["2"] });
const { warnings } = store.update("2", { add_blocks: ["1"] });
expect(store.get("1")!.blocks).toContain("2");
expect(store.get("2")!.blocks).toContain("1");
@@ -240,14 +240,14 @@ describe("TaskStore (in-memory)", () => {
it("allows self-dependency with warning", () => {
store.create("Self", "Desc", "done");
const { warnings } = store.update("1", { addBlocks: ["1"] });
const { warnings } = store.update("1", { add_blocks: ["1"] });
expect(store.get("1")!.blocks).toContain("1");
expect(warnings).toContain("#1 blocks itself");
});
it("stores dangling edge IDs with warning", () => {
store.create("Real", "Desc", "done");
const { warnings } = store.update("1", { addBlocks: ["9999"] });
const { warnings } = store.update("1", { add_blocks: ["9999"] });
expect(store.get("1")!.blocks).toContain("9999");
expect(warnings).toContain("#9999 does not exist");
});
@@ -255,7 +255,7 @@ describe("TaskStore (in-memory)", () => {
it("returns no warnings for valid dependencies", () => {
store.create("A", "Desc", "done");
store.create("B", "Desc", "done");
const { warnings } = store.update("1", { addBlocks: ["2"] });
const { warnings } = store.update("1", { add_blocks: ["2"] });
expect(warnings).toEqual([]);
});
@@ -264,11 +264,11 @@ describe("TaskStore (in-memory)", () => {
expect(t.subject).toBe(" ");
});
it("updates activeForm field", () => {
it("updates progress_label field", () => {
store.create("Test", "Desc", "done");
const { changedFields } = store.update("1", { activeForm: "Running tests" });
expect(changedFields).toContain("activeForm");
expect(store.get("1")!.activeForm).toBe("Running tests");
const { changedFields } = store.update("1", { progress_label: "Running tests" });
expect(changedFields).toContain("progress_label");
expect(store.get("1")!.progress_label).toBe("Running tests");
});
it("updates description field", () => {
@@ -295,9 +295,8 @@ describe("TaskStore (in-memory)", () => {
it("clearCompleted cleans up dependency edges", () => {
store.create("Blocker", "Desc", "done");
store.create("Blocked", "Desc", "done");
store.update("1", { addBlocks: ["2"] });
createAndApprove(store, "dummy"); // need task 1 to have pending_approval
// Actually set pending_approval on task 1
store.update("1", { add_blocks: ["2"] });
// Set pending_approval on task 1 so complete() works via /lgtm path
store.update("1", { pending_approval: true });
store.complete("1");
@@ -312,7 +311,7 @@ describe("TaskStore (in-memory)", () => {
store.create("B1", "Desc", "done");
store.create("B2", "Desc", "done");
store.update("1", { addBlocks: ["2", "3"] });
store.update("1", { add_blocks: ["2", "3"] });
expect(store.get("1")!.blocks).toEqual(["2", "3"]);
expect(store.get("2")!.blockedBy).toContain("1");
@@ -321,14 +320,14 @@ describe("TaskStore (in-memory)", () => {
it("addBlockedBy warns on self-dependency", () => {
store.create("Self", "Desc", "done");
const { warnings } = store.update("1", { addBlockedBy: ["1"] });
const { warnings } = store.update("1", { add_blocked_by: ["1"] });
expect(store.get("1")!.blockedBy).toContain("1");
expect(warnings).toContain("#1 blocks itself");
});
it("addBlockedBy warns on dangling ref", () => {
store.create("Real", "Desc", "done");
const { warnings } = store.update("1", { addBlockedBy: ["9999"] });
const { warnings } = store.update("1", { add_blocked_by: ["9999"] });
expect(store.get("1")!.blockedBy).toContain("9999");
expect(warnings).toContain("#9999 does not exist");
});
@@ -336,8 +335,8 @@ describe("TaskStore (in-memory)", () => {
it("addBlockedBy warns on cycle", () => {
store.create("A", "Desc", "done");
store.create("B", "Desc", "done");
store.update("1", { addBlocks: ["2"] });
const { warnings } = store.update("1", { addBlockedBy: ["2"] });
store.update("1", { add_blocks: ["2"] });
const { warnings } = store.update("1", { add_blocked_by: ["2"] });
expect(warnings).toContain("cycle: #1 and #2 block each other");
});
+10 -8
View File
@@ -73,7 +73,7 @@ describe("TaskWidget", () => {
widget.update();
const lines = renderWidget(ui.state);
expect(lines).toHaveLength(2); // header + 1 task
expect(lines).toHaveLength(3); // header + 1 task + done_criterion
expect(lines[0]).toContain("1 tasks");
expect(lines[0]).toContain("1 open");
expect(lines[1]).toContain("◻");
@@ -129,18 +129,20 @@ describe("TaskWidget", () => {
it("shows blocked-by info for pending tasks", () => {
store.create("Blocker", "Desc", "done");
store.create("Blocked", "Desc", "done");
store.update("2", { addBlockedBy: ["1"] });
store.update("2", { add_blocked_by: ["1"] });
widget.update();
const lines = renderWidget(ui.state);
const blockedLine = lines.find(l => l.includes("Blocked"));
// blocked-by suffix is only added via dim theme helper, which in mock is identity
// So we should see the raw text. Check for the relevant subject line having blocked-by info
expect(blockedLine).toContain("blocked by #1");
});
it("hides completed blockers in blocked-by suffix", () => {
store.create("Blocker", "Desc", "done");
store.create("Blocked", "Desc", "done");
store.update("2", { addBlockedBy: ["1"] });
store.update("2", { add_blocked_by: ["1"] });
store.update("1", { pending_approval: true });
store.complete("1");
widget.update();
@@ -178,14 +180,14 @@ describe("TaskWidget", () => {
it("limits visible tasks to MAX_VISIBLE_TASKS", () => {
for (let i = 0; i < 15; i++) {
store.create(`Task ${i + 1}`, "Desc");
store.create(`Task ${i + 1}`, "Desc", "done");
}
widget.update();
const lines = renderWidget(ui.state);
// header + 10 tasks + "… and 5 more"
// header + 5 visible tasks (each has 2 lines: task + done_criterion) + "...and 10 more"
expect(lines).toHaveLength(12);
expect(lines[11]).toContain("5 more");
expect(lines[11]).toContain("10 more");
});
it("tracks token usage for active tasks", () => {
@@ -244,7 +246,7 @@ describe("TaskWidget", () => {
const lines = renderWidget(ui.state);
expect(lines[1]).toContain("Processing A…");
expect(lines[2]).toContain("Processing B…");
expect(lines[3]).toContain("Processing B…");
});
it("distributes token usage across all active tasks", () => {
@@ -260,7 +262,7 @@ describe("TaskWidget", () => {
const lines = renderWidget(ui.state);
// Both tasks should have the same token counts
expect(lines[1]).toContain("↑ 100");
expect(lines[2]).toContain("↑ 100");
expect(lines[3]).toContain("↑ 100");
});
it("dispose clears widget and timer", () => {