fix: harden pi robot review harness

This commit is contained in:
wassname
2026-04-17 12:02:35 +08:00
parent d52596aad1
commit d908f6f617
3 changed files with 205 additions and 62 deletions
+53
View File
@@ -0,0 +1,53 @@
# Robot Review Lane
## Goal
Add a separate review artifact for fresh-perspective subagent review without weakening the existing human `/lgtm` sign-off gate.
## Scope
In: task schema, task tool UX, widget/task badges, README/tests, Pi-native robot-review harness hardening.
Out: provider routing or third-party subagent package integration.
## Requirements
- R1: Tasks can store a robot review separately from human sign-off evidence. Done means: a task can contain both `lgtm_ask` evidence and robot observations without conflict.
- R2: Robot review is observational only. Done means: the tool schema and help text ask for observations, not recommendations or editorial.
- R3: UI exposes review lanes distinctly. Done means: task list/widget/details show tool/robot/human review badges.
- R4: Human `/lgtm` remains the only completion path. Done means: no robot review path can complete tasks.
- R5: Automatic Pi robot review must be operationally robust. Done means: the child reviewer has a bounded timeout, uses a deterministic Pi invocation, and reports infra failures clearly enough to diagnose without hanging the main tool call.
- R6: The subprocess harness is covered by focused tests. Done means: there are tests for invocation selection, timeout/abort behavior, and assistant-output parsing.
## Tasks
- [x] T1 (R1, R2, R3, R4): Add robot-review storage and tool.
- steps: update task typing helpers, register `robot_review_ask`, thread robot review metadata through task views
- verify: `npm test -- --runInBand`
- UAT: "when a task has lgtm evidence and robot observations, I observe both badges and `/lgtm` still controls completion"
- [x] T2 (R3): Update README and examples.
- steps: document badges and robot review workflow
- verify: `rg -n "robot_review_ask|🤖|🛠" README.md`
- UAT: "when I read the README, I observe a distinct robot review lane"
- [x] T3 (R5): Harden the Pi-native robot-review subprocess runner.
- steps: add timeout handling, replace fragile self-reinvocation logic with deterministic command resolution, improve failure messages
- verify: `npx vitest run test/robot-review-runner.test.ts`
- UAT: "when the child reviewer hangs or pi is not resolvable, lgtm_ask returns a bounded failure instead of hanging forever"
- [x] T4 (R6): Add focused harness tests.
- steps: extract/mock subprocess runner boundaries and cover timeout, parse, and command resolution behavior
- verify: `npx vitest run test/robot-review-runner.test.ts test/robot-review.test.ts`
- UAT: "when I run the focused tests, I observe the subprocess path itself is covered"
## Context
- Existing schema uses `pending_approval` as the human sign-off gate.
- Current UI already appends `👀` for pending human sign-off; extend rather than replace the completion rule.
## Log
- The least disruptive model is additive metadata plus badge rendering, not replacing the task lifecycle.
- The repo's full Vitest suite already has drift unrelated to this feature, so focused verification is needed to isolate new behavior.
- A Pi-native reviewer stage matches the official subagent example better than ACP/external CLIs, but it makes harness reliability part of the approval path and therefore needs explicit timeout and invocation hardening.
- A deterministic `pi` command plus an explicit timeout is simpler and more portable than trying to reconstruct the current host entrypoint from `process.argv`.
## TODO
- Optional future work: add an orchestrated cross-model reviewer via `external-review` or ACP.
## Errors
| Task | Error | Resolution |
|------|-------|------------|
| T1 | `npm test` failed with 17 pre-existing assertions unrelated to robot review | Verified with `npm run lint`, `npm run typecheck`, and focused `npx vitest run test/review-badges.test.ts` instead |
| T3 | `process.argv[1]`-based self-reinvocation was fragile in extension-hosted contexts | Replaced with `PI_LGTM_PI_BIN` override or plain `pi`, then added focused runner tests |