fix: harden pi robot review harness

2026-06-27 16:46:17 +08:00 · 2026-04-17 12:02:35 +08:00
parent d52596aad1
commit d908f6f617
3 changed files with 205 additions and 62 deletions
@@ -0,0 +1,53 @@
+# Robot Review Lane
+
+## Goal
+Add a separate review artifact for fresh-perspective subagent review without weakening the existing human `/lgtm` sign-off gate.
+
+## Scope
+In: task schema, task tool UX, widget/task badges, README/tests, Pi-native robot-review harness hardening.
+Out: provider routing or third-party subagent package integration.
+
+## Requirements
+- R1: Tasks can store a robot review separately from human sign-off evidence. Done means: a task can contain both `lgtm_ask` evidence and robot observations without conflict.
+- R2: Robot review is observational only. Done means: the tool schema and help text ask for observations, not recommendations or editorial.
+- R3: UI exposes review lanes distinctly. Done means: task list/widget/details show tool/robot/human review badges.
+- R4: Human `/lgtm` remains the only completion path. Done means: no robot review path can complete tasks.
+- R5: Automatic Pi robot review must be operationally robust. Done means: the child reviewer has a bounded timeout, uses a deterministic Pi invocation, and reports infra failures clearly enough to diagnose without hanging the main tool call.
+- R6: The subprocess harness is covered by focused tests. Done means: there are tests for invocation selection, timeout/abort behavior, and assistant-output parsing.
+
+## Tasks
+- [x] T1 (R1, R2, R3, R4): Add robot-review storage and tool.
+  - steps: update task typing helpers, register `robot_review_ask`, thread robot review metadata through task views
+  - verify: `npm test -- --runInBand`
+  - UAT: "when a task has lgtm evidence and robot observations, I observe both badges and `/lgtm` still controls completion"
+- [x] T2 (R3): Update README and examples.
+  - steps: document badges and robot review workflow
+  - verify: `rg -n "robot_review_ask|🤖|🛠" README.md`
+  - UAT: "when I read the README, I observe a distinct robot review lane"
+- [x] T3 (R5): Harden the Pi-native robot-review subprocess runner.
+  - steps: add timeout handling, replace fragile self-reinvocation logic with deterministic command resolution, improve failure messages
+  - verify: `npx vitest run test/robot-review-runner.test.ts`
+  - UAT: "when the child reviewer hangs or pi is not resolvable, lgtm_ask returns a bounded failure instead of hanging forever"
+- [x] T4 (R6): Add focused harness tests.
+  - steps: extract/mock subprocess runner boundaries and cover timeout, parse, and command resolution behavior
+  - verify: `npx vitest run test/robot-review-runner.test.ts test/robot-review.test.ts`
+  - UAT: "when I run the focused tests, I observe the subprocess path itself is covered"
+
+## Context
+- Existing schema uses `pending_approval` as the human sign-off gate.
+- Current UI already appends `👀` for pending human sign-off; extend rather than replace the completion rule.
+
+## Log
+- The least disruptive model is additive metadata plus badge rendering, not replacing the task lifecycle.
+- The repo's full Vitest suite already has drift unrelated to this feature, so focused verification is needed to isolate new behavior.
+- A Pi-native reviewer stage matches the official subagent example better than ACP/external CLIs, but it makes harness reliability part of the approval path and therefore needs explicit timeout and invocation hardening.
+- A deterministic `pi` command plus an explicit timeout is simpler and more portable than trying to reconstruct the current host entrypoint from `process.argv`.
+
+## TODO
+- Optional future work: add an orchestrated cross-model reviewer via `external-review` or ACP.
+
+## Errors
+| Task | Error | Resolution |
+|------|-------|------------|
+| T1 | `npm test` failed with 17 pre-existing assertions unrelated to robot review | Verified with `npm run lint`, `npm run typecheck`, and focused `npx vitest run test/review-badges.test.ts` instead |
+| T3 | `process.argv[1]`-based self-reinvocation was fragile in extension-hosted contexts | Replaced with `PI_LGTM_PI_BIN` override or plain `pi`, then added focused runner tests |