Robot Review Lane

Goal

Add a separate review artifact for fresh-perspective subagent review without weakening the existing human /lgtm sign-off gate.

Scope

In: task schema, task tool UX, widget/task badges, README/tests, Pi-native robot-review harness hardening. Out: provider routing or third-party subagent package integration.

Requirements

R1: Tasks can store a robot review separately from human sign-off evidence. Done means: a task can contain both lgtm_ask evidence and robot observations without conflict.
R2: Robot review is observational only. Done means: the tool schema and help text ask for observations, not recommendations or editorial.
R3: UI exposes review lanes distinctly. Done means: task list/widget/details show tool/robot/human review badges.
R4: Human /lgtm remains the only completion path. Done means: no robot review path can complete tasks.
R5: Automatic Pi robot review must be operationally robust. Done means: the child reviewer has a bounded timeout, uses a deterministic Pi invocation, and reports infra failures clearly enough to diagnose without hanging the main tool call.
R6: The subprocess harness is covered by focused tests. Done means: there are tests for invocation selection, timeout/abort behavior, and assistant-output parsing.

Tasks

T1 (R1, R2, R3, R4): Add robot-review storage and tool.
- steps: update task typing helpers, register robot_review_ask, thread robot review metadata through task views
- verify: npm test -- --runInBand
- UAT: "when a task has lgtm evidence and robot observations, I observe both badges and /lgtm still controls completion"
T2 (R3): Update README and examples.
- steps: document badges and robot review workflow
- verify: rg -n "robot_review_ask|🤖|🛠" README.md
- UAT: "when I read the README, I observe a distinct robot review lane"
T3 (R5): Harden the Pi-native robot-review subprocess runner.
- steps: add timeout handling, replace fragile self-reinvocation logic with deterministic command resolution, improve failure messages
- verify: npx vitest run test/robot-review-runner.test.ts
- UAT: "when the child reviewer hangs or pi is not resolvable, lgtm_ask returns a bounded failure instead of hanging forever"
T4 (R6): Add focused harness tests.
- steps: extract/mock subprocess runner boundaries and cover timeout, parse, and command resolution behavior
- verify: npx vitest run test/robot-review-runner.test.ts test/robot-review.test.ts
- UAT: "when I run the focused tests, I observe the subprocess path itself is covered"

Context

Existing schema uses pending_approval as the human sign-off gate.
Current UI already appends 👀 for pending human sign-off; extend rather than replace the completion rule.

Log

The least disruptive model is additive metadata plus badge rendering, not replacing the task lifecycle.
The repo's full Vitest suite already has drift unrelated to this feature, so focused verification is needed to isolate new behavior.
A Pi-native reviewer stage matches the official subagent example better than ACP/external CLIs, but it makes harness reliability part of the approval path and therefore needs explicit timeout and invocation hardening.
A deterministic pi command plus an explicit timeout is simpler and more portable than trying to reconstruct the current host entrypoint from process.argv.

TODO

Optional future work: add an orchestrated cross-model reviewer via external-review or ACP.

Errors

Task	Error	Resolution
T1	`npm test` failed with 17 pre-existing assertions unrelated to robot review	Verified with `npm run lint`, `npm run typecheck`, and focused `npx vitest run test/review-badges.test.ts` instead
T3	`process.argv[1]`-based self-reinvocation was fragile in extension-hosted contexts	Replaced with `PI_LGTM_PI_BIN` override or plain `pi`, then added focused runner tests

3.9 KiB Raw Blame History