@wassname/pi-lgtm
Help your agent track goals and aim for human sign off.
A pi extension that adds structured human sign-off to task tracking. Fork of @tintinweb/pi-tasks with a minimal LGTM layer.
The core idea: agents cannot mark tasks complete themselves. They must call lgtm_ask with auditable evidence and explicit failure-mode analysis, then a human signs off via /lgtm <id>.
Tasks can also carry a separate fresh-perspective robot review from a subagent or other model family. Robot reviews can iterate: if the latest review says the evidence is incomplete or unconvincing, human sign-off is held back until the agent strengthens the evidence and reruns review.
Install
pi install npm:@wassname2/pi-lgtm
Or for development:
pi -e ./src/index.ts
What is different from pi-tasks
| pi-tasks | pi-lgtm |
|---|---|
Agent calls TaskUpdate { status: "completed" } |
Blocked -- throws error |
| No evidence required | lgtm_ask requires evidence, 2 failure modes, falsification test |
| Tasks complete immediately | Agent sets pending_approval, human runs /lgtm <id> |
| No done criterion | done_criterion required on create: falsifiable observation |
Stripped: TaskExecute, TaskOutput, TaskStop, process-tracker.ts, subagent RPC, settings menu.
Widget
● 3 tasks (1 done, 1 in progress, 1 open)
✔ #1 Design schema
✳ #2 Implementing cache layer… (2m 49s · ↑ 4.1k ↓ 1.2k)
◻ #3 Load test 🛠 🤖 👀
Badges:
🛠tool evidence attached vialgtm_ask🤖one or more robot review iterations attached👀pending human sign-off via/lgtm
Tools
TaskCreate
subject, description, done_criterion (required), progress_label (optional)
done_criterion must be a falsifiable observation: what you expect to see AND what you would see if it is wrong. Example: "All 92 tests pass. If wrong: type errors in build or failures in task-store.test.ts."
TaskList
Lists all tasks. 👀 indicates pending sign-off.
TaskGet
Full task details including done_criterion, approval state, completion mode, review state, a one-line gate status such as ready for human sign-off via /lgtm 5 or blocked: automatic robot review failed: ..., and evidence-iteration history.
TaskUpdate
Update status (pending | in_progress | deleted), subject, description, done_criterion, dependencies. Cannot set completed -- use /lgtm.
lgtm_ask
The epistemic gate. Required fields:
| Field | Description |
|---|---|
taskId |
Task to submit |
evidence |
Exact command run + output, commit hash, config/seeds, file paths. "I ran X and got Y" not "I wrote X". |
failure_likely |
Most likely way this is wrong despite evidence |
failure_sneaky |
Perverse/silent failure that looks like success superficially |
falsification_test |
What you ran and what you got, so both you and the human can sanity-check it. Why that result could not occur if a failure mode were real. |
verification_hints |
Where to look and what to check. Descriptions of evidence locations. |
remaining_uncertainty |
What is NOT tested, deferred edge cases, known limitations |
commands |
Optional structured command records: { cmd, exit_code, stdout_path?, stderr_path? } |
evidence_paths / falsification_paths |
Optional local artifact paths. Stored as absolute path + sha256 + byte size |
supersede_reason |
Optional reason when this replaces older evidence on the same task |
After calling this, the task shows 👀 and is only completable via /lgtm <id>. Evidence is stored on the task so the human can review it hours later without scrolling back. Re-submitting evidence archives the prior package into superseded history instead of silently overwriting it.
The tool result includes a non-blocking self-check prompt asking whether the evidence directly addresses the done_criterion and whether a skeptical reviewer would find it convincing.
lgtm_ask always runs the robot-review stage immediately after storing evidence. A failing or errored robot review clears pending_approval until the evidence is strengthened and reviewed again.
lgtm_supersede
Explicitly retire the current evidence package without completing the task.
Use this when the claim changed or the prior evidence is stale. The tool archives the current evidence, current robot reviews, and reviewer-failure context into history with your reason, then closes the human gate until new evidence is submitted.
robot_review_ask
Attach a fresh-perspective robot review to a task.
Required fields:
| Field | Description |
|---|---|
taskId |
Task to annotate |
reviewer |
Model/provider/family/class used for the review |
scope |
What the reviewer inspected |
observations |
Concrete observations only. No advice, verdicts, or editorial |
blind_spots |
What the reviewer did not inspect or could not verify |
accepted |
Overall accept/reject decision for whether the task is ready to advance |
evidence_complete |
Whether the supplied evidence actually covers the done criterion |
evidence_convincing |
Whether the supplied evidence would convince a skeptical reviewer |
missing_evidence |
Concrete missing checks or artifacts needed before human sign-off |
Use this from a separate subagent or other model when possible. Reviews append as iterations; the latest one is what gates human sign-off. If stored LGTM evidence already exists, an accepted manual review reopens the human sign-off gate.
robot_review_run
Run the configured automatic robot reviewer against the current task evidence.
Default reviewer stage:
pi --mode json -p --no-session
Override with:
PI_LGTM_ROBOT_REVIEW_MODEL='openai/gpt-5'
This appends a new robot-review iteration. The reviewer returns an explicit accepted boolean as well as detailed observations, blind spots, and missing evidence. If the latest robot review rejects the evidence, /lgtm is blocked until stronger evidence is submitted and reviewed again.
Commands
/lgtm <id>
Human-only sign-off. Shows stored evidence, falsification output, failure modes, review status, and remaining uncertainty in structured sections for review, then asks for confirmation. Without <id>, shows a list of pending-approval tasks.
/tasks
Interactive menu: view tasks, create task, clear completed/all.
Task lifecycle
pending -> in_progress -> (lgtm_ask)
-> current evidence iteration N
-> robot review iteration(s) on current evidence 🤖
-> pending_approval 👀 if latest robot review passes
-> reviewer_failed_to_run | reviewer_rejected
-> lgtm_supersede or newer lgtm_ask -> superseded history + fresh current evidence
-> (/lgtm) -> completed
-> deleted
Storage
Controlled by taskScope in .pi/tasks-config.json:
| Mode | File | Behaviour |
|---|---|---|
memory |
none | In-memory, lost on session end |
session (default) |
.pi/tasks/tasks-<sessionId>.json |
Per-session, survives resume |
project |
.pi/tasks/tasks.json |
Shared across all sessions |
Override via env:
PI_TASKS=off # in-memory (CI)
PI_TASKS=sprint-1 # named shared list at ~/.pi/tasks/sprint-1.json
PI_TASKS=/abs/path # explicit path
PI_TASKS_DEBUG=1 # trace to stderr
Architecture
src/
├── index.ts # 8 tools + /tasks + /lgtm commands + widget + event handlers
├── review-badges.ts # Review badge helpers for tool/robot/human lanes
├── robot-review.ts # Robot review iteration storage + compatibility helpers
├── types.ts # Task, TaskStatus types
├── task-store.ts # File-backed store with CRUD, locking, complete() method
├── auto-clear.ts # Turn-based auto-clearing of completed tasks
├── tasks-config.ts # Config persistence -> .pi/tasks-config.json
└── ui/
└── task-widget.ts # Widget with status icons, spinner, 👀 indicator
Development
npm install
npm run typecheck
npm test # 92 tests
npm run build
License
MIT -- based on tintinweb/pi-tasks (MIT)

