diff --git a/README.md b/README.md index 232845b..e98665e 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,7 @@ Tasks can also carry a separate fresh-perspective robot review from a subagent o ## Install ```bash -pi install npm:@wassname/pi-lgtm +pi install npm:@wassname2/pi-lgtm ``` Or for development: @@ -65,7 +65,7 @@ Lists all tasks. `👀` indicates pending sign-off. ### `TaskGet` -Full task details including `done_criterion` and approval state. +Full task details including `done_criterion`, approval state, and a one-line gate status such as `ready for human sign-off via /lgtm 5` or `blocked: automatic robot review failed: ...`. ### `TaskUpdate` @@ -109,7 +109,7 @@ Required fields: | `evidence_convincing` | Whether the supplied evidence would convince a skeptical reviewer | | `missing_evidence` | Concrete missing checks or artifacts needed before human sign-off | -Use this from a separate subagent or other model when possible. Reviews append as iterations; the latest one is what gates human sign-off. +Use this from a separate subagent or other model when possible. Reviews append as iterations; the latest one is what gates human sign-off. If stored LGTM evidence already exists, an accepted manual review reopens the human sign-off gate. ### `robot_review_run` @@ -133,7 +133,7 @@ This appends a new robot-review iteration. The reviewer returns an explicit `acc ### `/lgtm ` -Human-only sign-off. Shows stored evidence, failure modes, and remaining uncertainty for review, then asks for confirmation. Without ``, shows a list of pending-approval tasks. +Human-only sign-off. Shows stored evidence, falsification output, failure modes, review status, and remaining uncertainty in structured sections for review, then asks for confirmation. Without ``, shows a list of pending-approval tasks. ### `/tasks` diff --git a/docs/spec/2026-06-07_lgtm-friction-fixes.md b/docs/spec/2026-06-07_lgtm-friction-fixes.md new file mode 100644 index 0000000..eb797d4 --- /dev/null +++ b/docs/spec/2026-06-07_lgtm-friction-fixes.md @@ -0,0 +1,57 @@ +# LGTM friction fixes + +## Goal +Fix the subset of reported LGTM-tool friction that seems clearly worth it now. + +Specifically: make auto-review parse failures less opaque, let an accepted manual robot review reopen the human gate, show a compact current gate status, and make stored evidence easier to read during `/lgtm` review. + +## Scope +In: parser/error handling for automatic robot review, manual review gate semantics, gate-status rendering, `/lgtm` evidence formatting, focused tests, README updates. +Out: evidence path attachments, supersede operations, schema redesign for commands/log artifacts, new completion-mode field, full state-machine overhaul. + +## Requirements +- R1: Automatic robot review failures are diagnosable. Done means: parse errors include raw-output context and tolerate common wrapper noise around the JSON object. VERIFY: `npm test -- test/robot-review-runner.test.ts` shows parser fallback tests passing. If silent failure remained, the new malformed-output tests would still throw opaque errors. +- R2: Accepted manual robot review can reopen human sign-off after a failed auto review. Done means: storing an accepted manual review can set `pending_approval=true` when LGTM evidence exists. VERIFY: focused test covers this transition. If broken, the task would stay blocked after accepted manual review. +- R3: Task details expose current gate status in one short line. Done means: a helper summarizes states like ready, blocked on rejected review, or blocked on reviewer failure. VERIFY: focused tests assert representative summaries. If broken, TaskGet output would still require reading raw metadata to infer gate state. +- R4: `/lgtm` review output is easier to scan. Done means: evidence and falsification output are rendered in explicit markdown sections/code fences instead of one flat blob. VERIFY: source shows sectioned formatter used by `/lgtm`. If broken, the old unstructured evidence block remains. + +## Tasks +- [x] T1 (R1): Harden robot-review JSON extraction and error messages. + - steps: add parser helpers; tolerate fenced/extra prose around a JSON object inside markers; include output snippet on failure + - verify: `npm test -- test/robot-review-runner.test.ts` + - success: parser tests pass, including noisy-marker cases + - likely_fail: fallback never runs because markers are missing or parsing still uses raw `JSON.parse` + - sneaky_fail: fallback grabs the wrong braces and accepts junk; targeted tests catch nested/noisy cases + - UAT: "when robot review returns markers plus noise, I observe it still parses or at least shows the raw failure context" +- [x] T2 (R2,R3,R4): Fix gate semantics and human-facing formatting. + - steps: add gate-status helper; use it in TaskGet and lgtm_ask result; let accepted manual review set pending approval when evidence exists; format `/lgtm` evidence sections + - verify: `npm test -- test/robot-review.test.ts test/task-store.test.ts` + - success: manual-review gating tests pass and task summaries expose gate status + - likely_fail: accepted manual review still preserves old false gate state + - sneaky_fail: gate summary says ready while latest review actually failed; summary tests catch mismatches + - UAT: "when I inspect a task after review, I observe one clear gate-status line and readable evidence sections" +- [x] T3 (R1-R4): Update README for new semantics. + - steps: mention accepted manual review reopening the gate, clearer TaskGet state, and structured `/lgtm` evidence display + - verify: `rg -n "manual review|gate status|/lgtm" README.md` + - success: README mentions the changed behavior + - likely_fail: docs still imply only auto review can reopen approval + - sneaky_fail: docs mention review generally but omit the new gate semantics + - UAT: "when I read README, I can infer the new gate behavior without reading code" + +## Context +- Keep fail-fast simplicity. Do not add a large state-machine rewrite. +- Preserve the human `/lgtm` override path. +- Manual review should not complete tasks, only reopen the human gate when it accepts existing evidence. + +## Log +- 2026-06-07: User explicitly said not to fix everything, only the subset that makes sense. Chosen fixes are 1, 2, 10, plus `/lgtm` evidence readability because it is low-cost and directly reported. +- 2026-06-07: Implemented best-effort JSON extraction from noisy marker blocks, persisted automatic-review failure context in task metadata, reopened `pending_approval` when any accepted review validates stored evidence, added `Gate status:` summaries, and reformatted human review output into explicit sections/code fences. +- 2026-06-07: Verification passed with `npm test`, `npm run typecheck`, and `npm run lint`. + +## TODO +- Evidence-path attachments may be worth adding later if pasted blobs remain too awkward. +- Superseded-evidence support probably wants explicit iteration objects, not another metadata flag. + +## Errors +| Task | Error | Resolution | +|------|-------|------------| diff --git a/package.json b/package.json index ec0e748..26c7940 100644 --- a/package.json +++ b/package.json @@ -1,5 +1,5 @@ { - "name": "@wassname/pi-lgtm", + "name": "@wassname2/pi-lgtm", "version": "0.4.2", "description": "A pi extension providing goal tracking with structural sign-off and LGTM workflow.", "author": "wassname", diff --git a/src/index.ts b/src/index.ts index 7363dce..278ead7 100644 --- a/src/index.ts +++ b/src/index.ts @@ -27,16 +27,18 @@ import { join, resolve } from "node:path"; import type { ExtensionAPI, ExtensionCommandContext, ExtensionContext } from "@mariozechner/pi-coding-agent"; import { Type } from "@sinclair/typebox"; import { AutoClearManager } from "./auto-clear.js"; -import { type DisplayStatus, getDisplayStatus, getReviewBadges, getStateTag } from "./review-badges.js"; +import { type DisplayStatus, getDisplayStatus, getGateStatus, getReviewBadges, getStateTag } from "./review-badges.js"; import { appendRobotReviewMetadata, getLatestRobotReview, getRobotReviews, latestRobotReviewPasses, type RobotReviewRecord, + shouldOpenHumanSignoffGate, } from "./robot-review.js"; import { TaskStore } from "./task-store.js"; import { loadTasksConfig } from "./tasks-config.js"; +import type { Task } from "./types.js"; import { TaskWidget, type UICtx } from "./ui/task-widget.js"; function textResult(msg: string) { @@ -170,10 +172,130 @@ export async function runRobotReviewCommand( }); } -function extractRobotReviewJson(output: string): Record { +function summarizeRawOutput(output: string, maxChars = 400): string { + const singleLine = output.replace(/\s+/g, " ").trim(); + if (singleLine.length <= maxChars) return singleLine; + return `${singleLine.slice(0, maxChars)}...`; +} + +function stripMarkdownCodeFence(text: string): string { + const trimmed = text.trim(); + const fence = trimmed.match(/^```(?:json)?\s*([\s\S]*?)\s*```$/i); + return fence ? fence[1].trim() : trimmed; +} + +function extractBalancedJsonObject(text: string): string | undefined { + let start = -1; + let depth = 0; + let inString = false; + let escaped = false; + + for (let index = 0; index < text.length; index++) { + const char = text[index]; + if (escaped) { + escaped = false; + continue; + } + if (char === "\\") { + escaped = true; + continue; + } + if (char === '"') { + inString = !inString; + continue; + } + if (inString) continue; + if (char === "{") { + if (depth === 0) start = index; + depth++; + continue; + } + if (char === "}") { + if (depth === 0) continue; + depth--; + if (depth === 0 && start >= 0) return text.slice(start, index + 1); + } + } + return undefined; +} + +function getAutomaticReviewFailureMetadata(message: string, rawOutput?: string): Record { + return { + robot_review_last_error: message, + robot_review_last_error_output: rawOutput ?? null, + robot_review_last_error_at: new Date().toISOString(), + }; +} + +function clearAutomaticReviewFailureMetadata(): Record { + return { + robot_review_last_error: null, + robot_review_last_error_output: null, + robot_review_last_error_at: null, + }; +} + +function formatReviewTextBlock(title: string, body: string): string { + return `### ${title}\n\n\`\`\`text\n${body}\n\`\`\``; +} + +function formatEvidencePackage(task: Task): string[] { + const metadata = task.metadata ?? {}; + const sections: string[] = []; + if (typeof metadata.lgtm_evidence === "string") { + sections.push(formatReviewTextBlock("Evidence", metadata.lgtm_evidence)); + if (typeof metadata.lgtm_failure_likely === "string") sections.push(`### Failure (likely)\n${metadata.lgtm_failure_likely}`); + if (typeof metadata.lgtm_failure_sneaky === "string") sections.push(`### Failure (sneaky)\n${metadata.lgtm_failure_sneaky}`); + if (typeof metadata.lgtm_falsification_test === "string") { + sections.push(formatReviewTextBlock("Falsification test", metadata.lgtm_falsification_test)); + } + if (Array.isArray(metadata.lgtm_verification_hints) && metadata.lgtm_verification_hints.length > 0) { + sections.push(`### Verification hints\n${metadata.lgtm_verification_hints.map((hint: string) => `- ${hint}`).join("\n")}`); + } + if (typeof metadata.lgtm_remaining_uncertainty === "string") { + sections.push(`### Remaining uncertainty\n${metadata.lgtm_remaining_uncertainty}`); + } + if (typeof metadata.lgtm_submitted_at === "string") sections.push(`Submitted: ${metadata.lgtm_submitted_at}`); + } + if (typeof metadata.robot_review_last_error === "string") { + sections.push(`### Automatic robot review failure\n${metadata.robot_review_last_error}`); + if (typeof metadata.robot_review_last_error_output === "string" && metadata.robot_review_last_error_output.trim()) { + sections.push(formatReviewTextBlock("Reviewer raw output", metadata.robot_review_last_error_output)); + } + } + return sections; +} + +function getNonReviewMetadata(task: Task): Record { + return Object.fromEntries( + Object.entries(task.metadata ?? {}).filter(([key]) => !key.startsWith("lgtm_") && !key.startsWith("robot_review_")), + ); +} + +export function extractRobotReviewJson(output: string): Record { const match = output.match(/ROBOT_REVIEW_JSON_START\s*([\s\S]*?)\s*ROBOT_REVIEW_JSON_END/); - if (!match) throw new Error("Robot reviewer did not return the expected JSON markers."); - return JSON.parse(match[1]) as Record; + const source = match ? match[1] : output; + const candidates = [ + source.trim(), + stripMarkdownCodeFence(source), + extractBalancedJsonObject(source) ?? "", + extractBalancedJsonObject(stripMarkdownCodeFence(source)) ?? "", + ].filter(Boolean); + + let lastError: unknown; + for (const candidate of [...new Set(candidates)]) { + try { + return JSON.parse(candidate) as Record; + } catch (error) { + lastError = error; + } + } + + const prefix = match + ? "Robot reviewer returned invalid JSON" + : "Robot reviewer did not return the expected JSON markers or a parseable JSON object"; + const detail = lastError instanceof Error ? `: ${lastError.message}` : ""; + throw new Error(`${prefix}${detail}. Raw output: ${summarizeRawOutput(output)}`); } function formatRobotReview(review: RobotReviewRecord): string { @@ -265,11 +387,24 @@ async function runAutomaticRobotReview( const commandLabel = `${invocation.command} ${invocation.args.slice(0, reviewerModel ? 6 : 4).join(" ")}`; const result = await runRobotReviewCommand(invocation, signal, timeoutMs); if (result.exitCode !== 0) { - throw new Error(`Robot reviewer failed (${result.exitCode ?? "?"}): ${(result.stderr || result.stdout).trim()}`); + const error = new Error(`Robot reviewer failed (${result.exitCode ?? "?"}): ${(result.stderr || result.stdout).trim()}`) as Error & { rawOutput?: string }; + error.rawOutput = (result.stderr || result.stdout).trim(); + throw error; + } + let parsed: Record; + try { + parsed = extractRobotReviewJson(result.stdout); + } catch (error) { + const wrapped = new Error(error instanceof Error ? error.message : String(error)) as Error & { rawOutput?: string }; + wrapped.rawOutput = result.stdout.trim(); + throw wrapped; } - const parsed = extractRobotReviewJson(result.stdout); const observations = Array.isArray(parsed.observations) ? parsed.observations.filter((item): item is string => typeof item === "string") : []; - if (observations.length === 0) throw new Error("Robot reviewer returned no observations."); + if (observations.length === 0) { + const error = new Error("Robot reviewer returned no observations.") as Error & { rawOutput?: string }; + error.rawOutput = result.stdout.trim(); + throw error; + } const rawMissing: string[] = Array.isArray(parsed.missing_evidence) ? parsed.missing_evidence.filter((item): item is string => typeof item === "string") : []; @@ -548,13 +683,16 @@ export default function (pi: ExtensionAPI) { const lines: string[] = [ `Task #${task.id}: ${task.subject}`, `Status: ${task.status} ${getReviewBadges(task)}${task.pending_approval && task.status !== "completed" ? " (pending human sign-off)" : ""}`, + `Gate status: ${getGateStatus(task)}`, `Done criterion: ${task.done_criterion}`, + `Description: ${desc}`, ]; - lines.push(`Description: ${desc}`); if (robotReviews.length > 0) { const latest = robotReviews[robotReviews.length - 1]; - lines.push(`Robot reviews: ${robotReviews.length} (latest: complete=${latest.evidence_complete ? "yes" : "no"}, convincing=${latest.evidence_convincing ? "yes" : "no"})`); + lines.push(`Robot reviews: ${robotReviews.length} (latest: accepted=${latest.accepted ? "yes" : "no"}, complete=${latest.evidence_complete ? "yes" : "no"}, convincing=${latest.evidence_convincing ? "yes" : "no"})`); } + const evidenceSections = formatEvidencePackage(task); + if (evidenceSections.length > 0) lines.push(...evidenceSections); if (task.blockedBy.length > 0) { const openBlockers = task.blockedBy.filter(bid => { const blocker = store.get(bid); @@ -563,10 +701,10 @@ export default function (pi: ExtensionAPI) { if (openBlockers.length > 0) lines.push(`Blocked by: ${openBlockers.map(id => "#" + id).join(", ")}`); } if (task.blocks.length > 0) lines.push(`Blocks: ${task.blocks.map(id => "#" + id).join(", ")}`); - const metaKeys = Object.keys(task.metadata); - if (metaKeys.length > 0) lines.push(`Metadata: ${JSON.stringify(task.metadata)}`); + const metadata = getNonReviewMetadata(task); + if (Object.keys(metadata).length > 0) lines.push(`Metadata: ${JSON.stringify(metadata)}`); - return Promise.resolve(textResult(lines.join("\n"))); + return Promise.resolve(textResult(lines.join("\n\n"))); }, }); @@ -684,6 +822,7 @@ Do NOT summarize or interpret. Paste literal command output, exact log lines, ma lgtm_verification_hints: params.verification_hints, lgtm_remaining_uncertainty: params.remaining_uncertainty, lgtm_submitted_at: new Date().toISOString(), + ...clearAutomaticReviewFailureMetadata(), }, }); let robotReviewNote = ""; @@ -692,8 +831,11 @@ Do NOT summarize or interpret. Paste literal command output, exact log lines, ma try { const { review, command } = await runAutomaticRobotReview(refreshedTask, signal, currentProvider); store.update(params.taskId, { - pending_approval: review.accepted, - metadata: appendRobotReviewMetadata(refreshedTask, review), + pending_approval: shouldOpenHumanSignoffGate(refreshedTask, review.accepted), + metadata: { + ...appendRobotReviewMetadata(refreshedTask, review), + ...clearAutomaticReviewFailureMetadata(), + }, }); robotReviewNote = `\n\n### Automatic robot review\n` + @@ -712,33 +854,28 @@ Do NOT summarize or interpret. Paste literal command output, exact log lines, ma robotReviewNote += `\nResult: human sign-off has been held back until the evidence is strengthened and reviewed again.`; } } catch (err: any) { - store.update(params.taskId, { pending_approval: false }); + store.update(params.taskId, { + pending_approval: false, + metadata: getAutomaticReviewFailureMetadata(err.message, err.rawOutput), + }); robotReviewNote = `\n\n### Automatic robot review\n` + `Reviewer failed: ${err.message}\n` + - `Human sign-off is blocked until the reviewer stage succeeds.`; + `Human sign-off is blocked until the reviewer stage succeeds.` + + (typeof err.rawOutput === "string" && err.rawOutput.trim() + ? `\n\n${formatReviewTextBlock("Reviewer raw output", err.rawOutput.trim())}` + : ""); } widget.update(); - const hintsSection = params.verification_hints?.length - ? `\n### Verification hints\n${params.verification_hints.map(h => `- ${h}`).join("\n")}` - : ""; - const uncertaintySection = params.remaining_uncertainty - ? `\n### Remaining uncertainty\n${params.remaining_uncertainty}` - : ""; - + const updatedTask = store.get(task.id) ?? task; const result = `## Task #${task.id}: ${task.subject}\n` + `Done criterion: ${task.done_criterion}\n\n` + - `### Evidence\n${params.evidence}\n\n` + - `### Failure (likely)\n${params.failure_likely}\n\n` + - `### Failure (sneaky)\n${params.failure_sneaky}\n\n` + - `### Falsification test\n${params.falsification_test}` + - hintsSection + - uncertaintySection + + `${formatEvidencePackage(updatedTask).join("\n\n")}` + robotReviewNote + `\n\n---\n` + - `Task #${task.id} is now ${store.get(task.id)?.pending_approval ? `pending human sign-off via \`/lgtm ${task.id}\`` : "not yet ready for human sign-off"}.\n\n` + + `Gate status: ${getGateStatus(updatedTask)}\n\n` + `**Self-check (non-blocking):** Look at this as the human will see it. ` + `Does the evidence directly address the done_criterion "${task.done_criterion}"? ` + `Would a skeptical reviewer find this convincing, or would they immediately ask ` + @@ -778,21 +915,23 @@ This does not complete the task. Human /lgtm remains the only completion path.`, if (!task) return Promise.resolve(textResult(`Task #${params.taskId} not found`)); if (task.status === "completed") return Promise.resolve(textResult(`Task #${params.taskId} already completed`)); + const accepted = params.accepted ?? (params.evidence_complete && params.evidence_convincing); store.update(params.taskId, { - pending_approval: params.evidence_complete && params.evidence_convincing ? task.pending_approval : false, + pending_approval: shouldOpenHumanSignoffGate(task, accepted), metadata: { ...appendRobotReviewMetadata(task, { reviewer: params.reviewer, scope: params.scope, observations: params.observations, blind_spots: params.blind_spots, - accepted: params.accepted ?? (params.evidence_complete && params.evidence_convincing), + accepted, evidence_complete: params.evidence_complete, evidence_convincing: params.evidence_convincing, missing_evidence: params.missing_evidence ?? [], submitted_at: new Date().toISOString(), mode: "manual", }), + ...clearAutomaticReviewFailureMetadata(), }, }); widget.update(); @@ -802,12 +941,13 @@ This does not complete the task. Human /lgtm remains the only completion path.`, `Iteration: ${getRobotReviews(store.get(params.taskId)!).length}\n` + `Reviewer: ${params.reviewer}\n` + `Scope: ${params.scope}\n\n` + - `Accepted: ${(params.accepted ?? (params.evidence_complete && params.evidence_convincing)) ? "yes" : "no"}\n` + + `Accepted: ${accepted ? "yes" : "no"}\n` + `Evidence complete: ${params.evidence_complete ? "yes" : "no"}\n` + `Evidence convincing: ${params.evidence_convincing ? "yes" : "no"}\n\n` + `### Observations\n${params.observations.map(o => `- ${o}`).join("\n")}\n\n` + `${(params.missing_evidence?.length ?? 0) > 0 ? `### Missing evidence\n${(params.missing_evidence ?? []).map(item => `- ${item}`).join("\n")}\n\n` : ""}` + `### Blind spots\n${params.blind_spots}\n\n` + + `Gate status: ${getGateStatus(store.get(params.taskId) ?? task)}\n\n` + `🤖 Robot review stored. Human sign-off still requires \`/lgtm ${task.id}\`.`; return Promise.resolve(textResult(result)); @@ -833,14 +973,18 @@ This appends a new robot-review iteration. If the reviewer marks evidence incomp return textResult(`Task #${params.taskId} has no stored evidence yet. Call lgtm_ask first.`); } - const { review, command } = await runAutomaticRobotReview(task, signal, currentProvider); - store.update(params.taskId, { - pending_approval: review.accepted ? task.pending_approval : false, - metadata: appendRobotReviewMetadata(task, review), - }); - widget.update(); + try { + const { review, command } = await runAutomaticRobotReview(task, signal, currentProvider); + store.update(params.taskId, { + pending_approval: shouldOpenHumanSignoffGate(task, review.accepted), + metadata: { + ...appendRobotReviewMetadata(task, review), + ...clearAutomaticReviewFailureMetadata(), + }, + }); + widget.update(); - return textResult( + return textResult( `## Automatic robot review for task #${task.id}: ${task.subject}\n` + `Reviewer command: ${command}\n` + `Iteration: ${getRobotReviews(store.get(params.taskId)!).length}\n` + @@ -852,8 +996,24 @@ This appends a new robot-review iteration. If the reviewer marks evidence incomp : "") + `### Observations\n${review.observations.map(o => `- ${o}`).join("\n")}\n\n` + `${review.missing_evidence.length > 0 ? `### Missing evidence\n${review.missing_evidence.map(item => `- ${item}`).join("\n")}\n\n` : ""}` + - `### Blind spots\n${review.blind_spots}`, - ); + `### Blind spots\n${review.blind_spots}\n\n` + + `Gate status: ${getGateStatus(store.get(params.taskId) ?? task)}`, + ); + } catch (err: any) { + store.update(params.taskId, { + pending_approval: false, + metadata: getAutomaticReviewFailureMetadata(err.message, err.rawOutput), + }); + widget.update(); + return textResult( + `## Automatic robot review for task #${task.id}: ${task.subject}\n` + + `Reviewer failed: ${err.message}\n\n` + + `Gate status: ${getGateStatus(store.get(params.taskId) ?? task)}` + + (typeof err.rawOutput === "string" && err.rawOutput.trim() + ? `\n\n${formatReviewTextBlock("Reviewer raw output", err.rawOutput.trim())}` + : ""), + ); + } }, }); @@ -999,17 +1159,12 @@ This appends a new robot-review iteration. If the reviewer marks evidence incomp const robotReviews = getRobotReviews(task); const noEvidence = !task.pending_approval && !m.lgtm_evidence; const robotRejected = robotReviews.length > 0 && !latestRobotReviewPasses(task); + const reviewerFailed = typeof m.robot_review_last_error === "string"; // Print evidence to the conversation so the user can review it there - const evidenceParts: string[] = []; + const evidenceParts: string[] = [`Gate status: ${getGateStatus(task)}`]; if (m.lgtm_evidence) { - evidenceParts.push(`**Evidence:**\n${m.lgtm_evidence}`); - evidenceParts.push(`Failure (likely): ${m.lgtm_failure_likely}`); - evidenceParts.push(`Failure (sneaky): ${m.lgtm_failure_sneaky}`); - if (m.lgtm_falsification_test) evidenceParts.push(`Falsification test: ${m.lgtm_falsification_test}`); - if (m.lgtm_remaining_uncertainty) evidenceParts.push(`Remaining uncertainty: ${m.lgtm_remaining_uncertainty}`); - if (m.lgtm_verification_hints?.length) evidenceParts.push(`Hints: ${m.lgtm_verification_hints.join(", ")}`); - evidenceParts.push(`Submitted: ${m.lgtm_submitted_at}`); + evidenceParts.push(...formatEvidencePackage(task)); } else { evidenceParts.push(`(No agent-submitted evidence — agent never called lgtm_ask. Human override.)`); } @@ -1030,6 +1185,9 @@ This appends a new robot-review iteration. If the reviewer marks evidence incomp if (noEvidence) { title = `⚠ Task #${taskId} has no agent-submitted evidence.\nSign off anyway?\nDone: ${task.done_criterion}`; signLabel = "✓ Override — sign off without evidence"; + } else if (reviewerFailed) { + title = `⚠ Task #${taskId} automatic robot review failed.\nSign off anyway?\nDone: ${task.done_criterion}`; + signLabel = "✓ Override — sign off despite reviewer failure"; } else if (robotRejected) { title = `⚠ Task #${taskId} robot review rejected the evidence.\nSign off anyway?\nDone: ${task.done_criterion}`; signLabel = "✓ Override — sign off despite rejected review"; diff --git a/src/review-badges.ts b/src/review-badges.ts index 55b2460..295278c 100644 --- a/src/review-badges.ts +++ b/src/review-badges.ts @@ -1,4 +1,4 @@ -import { getRobotReviews } from "./robot-review.js"; +import { getLatestRobotReview, getRobotReviews } from "./robot-review.js"; import type { Task } from "./types.js"; const STAGES = ["🛠", "🤖", "👀"] as const; @@ -14,6 +14,13 @@ export function getReviewBadges(task: Task): string { return `[${slots.join("")}]`; } +export const REVIEW_BADGES = { + evidence: STAGES[0], + robot: STAGES[1], + human: STAGES[2], + pipeline: STAGES, +}; + export type DisplayStatus = "awaiting_signoff" | "in_progress" | "pending" | "completed"; /** Derived display bucket. `awaiting_signoff` is pending_approval && !completed. */ @@ -25,6 +32,19 @@ export function getDisplayStatus(task: Task): DisplayStatus { export type StateTag = "READY" | "ACTIVE" | "PENDING" | "DONE"; +export function getGateStatus(task: Task): string { + if (task.status === "completed") return "human signed off"; + if (!task.metadata?.lgtm_evidence) return "no lgtm evidence submitted"; + if (task.pending_approval) return `ready for human sign-off via /lgtm ${task.id}`; + if (typeof task.metadata?.robot_review_last_error === "string") { + return `blocked: automatic robot review failed: ${task.metadata.robot_review_last_error}`; + } + const latest = getLatestRobotReview(task); + if (latest && !latest.accepted) return "blocked: latest robot review rejected the evidence"; + if (latest?.accepted) return "accepted robot review present, but the human gate is still closed"; + return "blocked: evidence submitted, robot review still required"; +} + /** Short uppercase tag for the human ("can I /lgtm this?" at a glance). */ export function getStateTag(task: Task): StateTag { const s = getDisplayStatus(task); diff --git a/src/robot-review.ts b/src/robot-review.ts index b998153..d0d754c 100644 --- a/src/robot-review.ts +++ b/src/robot-review.ts @@ -101,6 +101,10 @@ export function getLatestRobotReview(task: Task): RobotReviewRecord | undefined return reviews.length > 0 ? reviews[reviews.length - 1] : undefined; } +export function shouldOpenHumanSignoffGate(task: Task, reviewAccepted: boolean): boolean { + return reviewAccepted && typeof task.metadata?.lgtm_evidence === "string" && task.metadata.lgtm_evidence.length > 0; +} + export function appendRobotReviewMetadata(task: Task, review: Omit): Record { const robot_reviews = [...getRobotReviews(task), { ...review, iteration: 0 }].map((entry, index) => ({ ...entry, diff --git a/test/review-badges.test.ts b/test/review-badges.test.ts index 176b6cd..02a4235 100644 --- a/test/review-badges.test.ts +++ b/test/review-badges.test.ts @@ -1,5 +1,5 @@ import { describe, expect, it } from "vitest"; -import { getDisplayStatus, getReviewBadges } from "../src/review-badges.js"; +import { getDisplayStatus, getGateStatus, getReviewBadges } from "../src/review-badges.js"; import type { Task } from "../src/types.js"; function makeTask(overrides: Partial = {}): Task { @@ -60,6 +60,45 @@ describe("getReviewBadges", () => { }); }); +describe("getGateStatus", () => { + it("reports ready when human sign-off is open", () => { + expect(getGateStatus(makeTask({ + pending_approval: true, + metadata: { lgtm_evidence: "ok" }, + }))).toBe("ready for human sign-off via /lgtm 1"); + }); + + it("reports reviewer failure separately from rejected evidence", () => { + expect(getGateStatus(makeTask({ + metadata: { + lgtm_evidence: "ok", + robot_review_last_error: "Unexpected token 'a'", + }, + }))).toContain("automatic robot review failed"); + }); + + it("reports rejected robot review when latest review does not accept", () => { + expect(getGateStatus(makeTask({ + metadata: { + lgtm_evidence: "ok", + robot_reviews: [{ + iteration: 1, + reviewer: "opencode", + scope: "task evidence", + observations: ["Observed missing output"], + blind_spots: "none", + accepted: false, + evidence_complete: false, + evidence_convincing: false, + missing_evidence: ["literal output"], + submitted_at: "2026-04-17T00:00:00.000Z", + mode: "manual", + }], + }, + }))).toBe("blocked: latest robot review rejected the evidence"); + }); +}); + describe("getDisplayStatus", () => { it("returns pending for fresh tasks", () => { expect(getDisplayStatus(makeTask())).toBe("pending"); diff --git a/test/robot-review-runner.test.ts b/test/robot-review-runner.test.ts index 2f0ffcc..f0b493e 100644 --- a/test/robot-review-runner.test.ts +++ b/test/robot-review-runner.test.ts @@ -2,6 +2,7 @@ import { describe, expect, it } from "vitest"; import { DEFAULT_ROBOT_REVIEW_TIMEOUT_MS, extractFinalAssistantTextFromPiJsonl, + extractRobotReviewJson, getPiInvocation, getRobotReviewTimeoutMs, runRobotReviewCommand, @@ -27,6 +28,22 @@ describe("robot review runner helpers", () => { expect(extractFinalAssistantTextFromPiJsonl(output)).toContain("ROBOT_REVIEW_JSON_START"); }); + it("parses noisy JSON wrapped in review markers", () => { + const output = [ + "ROBOT_REVIEW_JSON_START", + "and here is the JSON you asked for:", + "```json", + '{"accepted":true,"observations":["ok"]}', + "```", + "ROBOT_REVIEW_JSON_END", + ].join("\n"); + expect(extractRobotReviewJson(output)).toEqual({ accepted: true, observations: ["ok"] }); + }); + + it("includes raw output context on parse failure", () => { + expect(() => extractRobotReviewJson("ROBOT_REVIEW_JSON_START and nope ROBOT_REVIEW_JSON_END")).toThrow(/Raw output:/); + }); + it("uses configured timeout or falls back to default", () => { expect(getRobotReviewTimeoutMs({ PI_LGTM_ROBOT_REVIEW_TIMEOUT_MS: "2500" } as NodeJS.ProcessEnv)).toBe(2500); expect(getRobotReviewTimeoutMs({ PI_LGTM_ROBOT_REVIEW_TIMEOUT_MS: "bad" } as NodeJS.ProcessEnv)).toBe(DEFAULT_ROBOT_REVIEW_TIMEOUT_MS); diff --git a/test/robot-review.test.ts b/test/robot-review.test.ts index 66e5a15..28c944b 100644 --- a/test/robot-review.test.ts +++ b/test/robot-review.test.ts @@ -1,5 +1,5 @@ import { describe, expect, it } from "vitest"; -import { appendRobotReviewMetadata, getLatestRobotReview, getRobotReviews } from "../src/robot-review.js"; +import { appendRobotReviewMetadata, getLatestRobotReview, getRobotReviews, shouldOpenHumanSignoffGate } from "../src/robot-review.js"; import type { Task } from "../src/types.js"; function makeTask(overrides: Partial = {}): Task { @@ -21,6 +21,12 @@ function makeTask(overrides: Partial = {}): Task { } describe("robot review helpers", () => { + it("reopens the human gate when accepted review exists for stored evidence", () => { + expect(shouldOpenHumanSignoffGate(makeTask({ metadata: { lgtm_evidence: "literal output" } }), true)).toBe(true); + expect(shouldOpenHumanSignoffGate(makeTask({ metadata: { lgtm_evidence: "literal output" } }), false)).toBe(false); + expect(shouldOpenHumanSignoffGate(makeTask(), true)).toBe(false); + }); + it("reads legacy single-review metadata", () => { const task = makeTask({ metadata: { diff --git a/test/task-store.test.ts b/test/task-store.test.ts index 9852ae5..96219a2 100644 --- a/test/task-store.test.ts +++ b/test/task-store.test.ts @@ -34,7 +34,7 @@ describe("TaskStore (in-memory)", () => { it("creates tasks with optional fields", () => { const t = store.create("Task", "Desc", "done criterion", "Running task", { key: "value" }); - expect(t.activeForm).toBe("Running task"); + expect(t.progress_label).toBe("Running task"); expect(t.metadata).toEqual({ key: "value" }); }); @@ -72,16 +72,16 @@ describe("TaskStore (in-memory)", () => { const { changedFields } = store.update("1", { subject: "Updated subject", description: "Updated desc", - owner: "agent-1", + metadata: { owner: "agent-1" }, }); expect(changedFields).toContain("subject"); expect(changedFields).toContain("description"); - expect(changedFields).toContain("owner"); + expect(changedFields).toContain("metadata"); const task = store.get("1")!; expect(task.subject).toBe("Updated subject"); - expect(task.owner).toBe("agent-1"); + expect(task.metadata.owner).toBe("agent-1"); }); it("deletes a task with status: deleted", () => { @@ -114,7 +114,7 @@ describe("TaskStore (in-memory)", () => { store.create("Blocker", "Desc", "done"); store.create("Blocked", "Desc", "done"); - store.update("1", { addBlocks: ["2"] }); + store.update("1", { add_blocks: ["2"] }); const t1 = store.get("1")!; const t2 = store.get("2")!; @@ -126,7 +126,7 @@ describe("TaskStore (in-memory)", () => { store.create("Blocker", "Desc", "done"); store.create("Blocked", "Desc", "done"); - store.update("2", { addBlockedBy: ["1"] }); + store.update("2", { add_blocked_by: ["1"] }); const t1 = store.get("1")!; const t2 = store.get("2")!; @@ -138,8 +138,8 @@ describe("TaskStore (in-memory)", () => { store.create("A", "Desc", "done"); store.create("B", "Desc", "done"); - store.update("1", { addBlocks: ["2"] }); - store.update("1", { addBlocks: ["2"] }); // duplicate + store.update("1", { add_blocks: ["2"] }); + store.update("1", { add_blocks: ["2"] }); // duplicate const t1 = store.get("1")!; expect(t1.blocks.filter(id => id === "2")).toHaveLength(1); @@ -148,7 +148,7 @@ describe("TaskStore (in-memory)", () => { it("cleans up dependency edges on deletion", () => { store.create("A", "Desc", "done"); store.create("B", "Desc", "done"); - store.update("1", { addBlocks: ["2"] }); + store.update("1", { add_blocks: ["2"] }); store.update("1", { status: "deleted" }); @@ -230,8 +230,8 @@ describe("TaskStore (in-memory)", () => { it("allows circular dependencies with warning", () => { store.create("A", "Desc", "done"); store.create("B", "Desc", "done"); - store.update("1", { addBlocks: ["2"] }); - const { warnings } = store.update("2", { addBlocks: ["1"] }); + store.update("1", { add_blocks: ["2"] }); + const { warnings } = store.update("2", { add_blocks: ["1"] }); expect(store.get("1")!.blocks).toContain("2"); expect(store.get("2")!.blocks).toContain("1"); @@ -240,14 +240,14 @@ describe("TaskStore (in-memory)", () => { it("allows self-dependency with warning", () => { store.create("Self", "Desc", "done"); - const { warnings } = store.update("1", { addBlocks: ["1"] }); + const { warnings } = store.update("1", { add_blocks: ["1"] }); expect(store.get("1")!.blocks).toContain("1"); expect(warnings).toContain("#1 blocks itself"); }); it("stores dangling edge IDs with warning", () => { store.create("Real", "Desc", "done"); - const { warnings } = store.update("1", { addBlocks: ["9999"] }); + const { warnings } = store.update("1", { add_blocks: ["9999"] }); expect(store.get("1")!.blocks).toContain("9999"); expect(warnings).toContain("#9999 does not exist"); }); @@ -255,7 +255,7 @@ describe("TaskStore (in-memory)", () => { it("returns no warnings for valid dependencies", () => { store.create("A", "Desc", "done"); store.create("B", "Desc", "done"); - const { warnings } = store.update("1", { addBlocks: ["2"] }); + const { warnings } = store.update("1", { add_blocks: ["2"] }); expect(warnings).toEqual([]); }); @@ -264,11 +264,11 @@ describe("TaskStore (in-memory)", () => { expect(t.subject).toBe(" "); }); - it("updates activeForm field", () => { + it("updates progress_label field", () => { store.create("Test", "Desc", "done"); - const { changedFields } = store.update("1", { activeForm: "Running tests" }); - expect(changedFields).toContain("activeForm"); - expect(store.get("1")!.activeForm).toBe("Running tests"); + const { changedFields } = store.update("1", { progress_label: "Running tests" }); + expect(changedFields).toContain("progress_label"); + expect(store.get("1")!.progress_label).toBe("Running tests"); }); it("updates description field", () => { @@ -295,9 +295,8 @@ describe("TaskStore (in-memory)", () => { it("clearCompleted cleans up dependency edges", () => { store.create("Blocker", "Desc", "done"); store.create("Blocked", "Desc", "done"); - store.update("1", { addBlocks: ["2"] }); - createAndApprove(store, "dummy"); // need task 1 to have pending_approval - // Actually set pending_approval on task 1 + store.update("1", { add_blocks: ["2"] }); + // Set pending_approval on task 1 so complete() works via /lgtm path store.update("1", { pending_approval: true }); store.complete("1"); @@ -312,7 +311,7 @@ describe("TaskStore (in-memory)", () => { store.create("B1", "Desc", "done"); store.create("B2", "Desc", "done"); - store.update("1", { addBlocks: ["2", "3"] }); + store.update("1", { add_blocks: ["2", "3"] }); expect(store.get("1")!.blocks).toEqual(["2", "3"]); expect(store.get("2")!.blockedBy).toContain("1"); @@ -321,14 +320,14 @@ describe("TaskStore (in-memory)", () => { it("addBlockedBy warns on self-dependency", () => { store.create("Self", "Desc", "done"); - const { warnings } = store.update("1", { addBlockedBy: ["1"] }); + const { warnings } = store.update("1", { add_blocked_by: ["1"] }); expect(store.get("1")!.blockedBy).toContain("1"); expect(warnings).toContain("#1 blocks itself"); }); it("addBlockedBy warns on dangling ref", () => { store.create("Real", "Desc", "done"); - const { warnings } = store.update("1", { addBlockedBy: ["9999"] }); + const { warnings } = store.update("1", { add_blocked_by: ["9999"] }); expect(store.get("1")!.blockedBy).toContain("9999"); expect(warnings).toContain("#9999 does not exist"); }); @@ -336,8 +335,8 @@ describe("TaskStore (in-memory)", () => { it("addBlockedBy warns on cycle", () => { store.create("A", "Desc", "done"); store.create("B", "Desc", "done"); - store.update("1", { addBlocks: ["2"] }); - const { warnings } = store.update("1", { addBlockedBy: ["2"] }); + store.update("1", { add_blocks: ["2"] }); + const { warnings } = store.update("1", { add_blocked_by: ["2"] }); expect(warnings).toContain("cycle: #1 and #2 block each other"); }); diff --git a/test/task-widget.test.ts b/test/task-widget.test.ts index 8479b5d..35d9291 100644 --- a/test/task-widget.test.ts +++ b/test/task-widget.test.ts @@ -73,7 +73,7 @@ describe("TaskWidget", () => { widget.update(); const lines = renderWidget(ui.state); - expect(lines).toHaveLength(2); // header + 1 task + expect(lines).toHaveLength(3); // header + 1 task + done_criterion expect(lines[0]).toContain("1 tasks"); expect(lines[0]).toContain("1 open"); expect(lines[1]).toContain("◻"); @@ -129,18 +129,20 @@ describe("TaskWidget", () => { it("shows blocked-by info for pending tasks", () => { store.create("Blocker", "Desc", "done"); store.create("Blocked", "Desc", "done"); - store.update("2", { addBlockedBy: ["1"] }); + store.update("2", { add_blocked_by: ["1"] }); widget.update(); const lines = renderWidget(ui.state); const blockedLine = lines.find(l => l.includes("Blocked")); + // blocked-by suffix is only added via dim theme helper, which in mock is identity + // So we should see the raw text. Check for the relevant subject line having blocked-by info expect(blockedLine).toContain("blocked by #1"); }); it("hides completed blockers in blocked-by suffix", () => { store.create("Blocker", "Desc", "done"); store.create("Blocked", "Desc", "done"); - store.update("2", { addBlockedBy: ["1"] }); + store.update("2", { add_blocked_by: ["1"] }); store.update("1", { pending_approval: true }); store.complete("1"); widget.update(); @@ -178,14 +180,14 @@ describe("TaskWidget", () => { it("limits visible tasks to MAX_VISIBLE_TASKS", () => { for (let i = 0; i < 15; i++) { - store.create(`Task ${i + 1}`, "Desc"); + store.create(`Task ${i + 1}`, "Desc", "done"); } widget.update(); const lines = renderWidget(ui.state); - // header + 10 tasks + "… and 5 more" + // header + 5 visible tasks (each has 2 lines: task + done_criterion) + "...and 10 more" expect(lines).toHaveLength(12); - expect(lines[11]).toContain("5 more"); + expect(lines[11]).toContain("10 more"); }); it("tracks token usage for active tasks", () => { @@ -244,7 +246,7 @@ describe("TaskWidget", () => { const lines = renderWidget(ui.state); expect(lines[1]).toContain("Processing A…"); - expect(lines[2]).toContain("Processing B…"); + expect(lines[3]).toContain("Processing B…"); }); it("distributes token usage across all active tasks", () => { @@ -260,7 +262,7 @@ describe("TaskWidget", () => { const lines = renderWidget(ui.state); // Both tasks should have the same token counts expect(lines[1]).toContain("↑ 100"); - expect(lines[2]).toContain("↑ 100"); + expect(lines[3]).toContain("↑ 100"); }); it("dispose clears widget and timer", () => {