Files
pi-lgtm/docs/spec/2026-04-17_robot-review-lane.md
T
2026-04-17 12:02:35 +08:00

3.9 KiB

Robot Review Lane

Goal

Add a separate review artifact for fresh-perspective subagent review without weakening the existing human /lgtm sign-off gate.

Scope

In: task schema, task tool UX, widget/task badges, README/tests, Pi-native robot-review harness hardening. Out: provider routing or third-party subagent package integration.

Requirements

  • R1: Tasks can store a robot review separately from human sign-off evidence. Done means: a task can contain both lgtm_ask evidence and robot observations without conflict.
  • R2: Robot review is observational only. Done means: the tool schema and help text ask for observations, not recommendations or editorial.
  • R3: UI exposes review lanes distinctly. Done means: task list/widget/details show tool/robot/human review badges.
  • R4: Human /lgtm remains the only completion path. Done means: no robot review path can complete tasks.
  • R5: Automatic Pi robot review must be operationally robust. Done means: the child reviewer has a bounded timeout, uses a deterministic Pi invocation, and reports infra failures clearly enough to diagnose without hanging the main tool call.
  • R6: The subprocess harness is covered by focused tests. Done means: there are tests for invocation selection, timeout/abort behavior, and assistant-output parsing.

Tasks

  • T1 (R1, R2, R3, R4): Add robot-review storage and tool.
    • steps: update task typing helpers, register robot_review_ask, thread robot review metadata through task views
    • verify: npm test -- --runInBand
    • UAT: "when a task has lgtm evidence and robot observations, I observe both badges and /lgtm still controls completion"
  • T2 (R3): Update README and examples.
    • steps: document badges and robot review workflow
    • verify: rg -n "robot_review_ask|🤖|🛠" README.md
    • UAT: "when I read the README, I observe a distinct robot review lane"
  • T3 (R5): Harden the Pi-native robot-review subprocess runner.
    • steps: add timeout handling, replace fragile self-reinvocation logic with deterministic command resolution, improve failure messages
    • verify: npx vitest run test/robot-review-runner.test.ts
    • UAT: "when the child reviewer hangs or pi is not resolvable, lgtm_ask returns a bounded failure instead of hanging forever"
  • T4 (R6): Add focused harness tests.
    • steps: extract/mock subprocess runner boundaries and cover timeout, parse, and command resolution behavior
    • verify: npx vitest run test/robot-review-runner.test.ts test/robot-review.test.ts
    • UAT: "when I run the focused tests, I observe the subprocess path itself is covered"

Context

  • Existing schema uses pending_approval as the human sign-off gate.
  • Current UI already appends 👀 for pending human sign-off; extend rather than replace the completion rule.

Log

  • The least disruptive model is additive metadata plus badge rendering, not replacing the task lifecycle.
  • The repo's full Vitest suite already has drift unrelated to this feature, so focused verification is needed to isolate new behavior.
  • A Pi-native reviewer stage matches the official subagent example better than ACP/external CLIs, but it makes harness reliability part of the approval path and therefore needs explicit timeout and invocation hardening.
  • A deterministic pi command plus an explicit timeout is simpler and more portable than trying to reconstruct the current host entrypoint from process.argv.

TODO

  • Optional future work: add an orchestrated cross-model reviewer via external-review or ACP.

Errors

Task Error Resolution
T1 npm test failed with 17 pre-existing assertions unrelated to robot review Verified with npm run lint, npm run typecheck, and focused npx vitest run test/review-badges.test.ts instead
T3 process.argv[1]-based self-reinvocation was fragile in extension-hosted contexts Replaced with PI_LGTM_PI_BIN override or plain pi, then added focused runner tests