mirror of https://github.com/wassname/pi-lgtm.git synced 2026-06-27 15:31:29 +08:00

T

wassname (Michael J Clark) 91cbe59e59 Update README with project goals and task list

Added original ask and project goals to README.

2026-06-14 08:01:16 +08:00

.github/workflows

add build workflow

2026-03-22 21:57:14 +01:00

docs/spec

clean up lgtm metadata helpers

2026-06-07 19:06:26 +08:00

img

feat: two-tier task model — trivial tasks self-complete, lgtm gates significant claims

2026-04-25 18:18:48 +08:00

media

screenshot

2026-04-17 06:47:19 +08:00

src

Relax verification hints auto-review gate

2026-06-14 06:35:00 +08:00

test

Relax verification hints auto-review gate

2026-06-14 06:35:00 +08:00

.gitignore

v0.1.0

2026-03-12 00:47:54 +01:00

.npmignore

v0.1.0

2026-03-12 00:47:54 +01:00

biome.json

add biome

2026-03-22 14:51:07 +01:00

CHANGELOG.md

widget add taskid - fixes #3

2026-03-24 15:14:07 +01:00

LICENSE

v0.1.0

2026-03-12 00:47:54 +01:00

package-lock.json

screenshot

2026-04-17 06:47:19 +08:00

package.json

fix lgtm review gating and evidence display

2026-06-07 18:55:05 +08:00

README.md

Update README with project goals and task list

2026-06-14 08:01:16 +08:00

tsconfig.json

v0.1.0

2026-03-12 00:47:54 +01:00

README.md

@wassname/pi-lgtm

Original ask:

I would like a task list where

the top level tasks are goals and proof.

A form that makes the agent think about "what are the most likely, subtle, failure modes and what cheap and easy to review evidence can distinguish between them and goal success".

A subagent provides independant sanity check of on the evidence before completing

wassname

Help your agent track goals and aim for human sign off.

A pi extension that adds structured human sign-off to task tracking. Fork of @tintinweb/pi-tasks with a minimal LGTM layer.

The core idea: agents cannot mark tasks complete themselves. They must call lgtm_ask with auditable evidence and explicit failure-mode analysis, then a human signs off via /lgtm <id>.

Tasks can also carry a separate fresh-perspective robot review from a subagent or other model family. Robot reviews can iterate: if the latest review says the evidence is incomplete or unconvincing, human sign-off is held back until the agent strengthens the evidence and reruns review.

Install

pi install npm:@wassname2/pi-lgtm

Or for development:

pi -e ./src/index.ts

What is different from pi-tasks

pi-tasks	pi-lgtm
Agent calls `TaskUpdate { status: "completed" }`	Blocked -- throws error
No evidence required	`lgtm_ask` requires evidence, 2 failure modes, falsification test
Tasks complete immediately	Agent sets `pending_approval`, human runs `/lgtm <id>`
No done criterion	`done_criterion` required on create: falsifiable observation

Stripped: TaskExecute, TaskOutput, TaskStop, process-tracker.ts, subagent RPC, settings menu.

● 3 tasks (1 done, 1 in progress, 1 open)
  ✔ #1 Design schema
  ✳ #2 Implementing cache layer… (2m 49s · ↑ 4.1k ↓ 1.2k)
  ◻ #3 Load test 🛠 🤖 👀

Badges:

🛠 tool evidence attached via lgtm_ask
🤖 one or more robot review iterations attached
👀 pending human sign-off via /lgtm

Tools

`TaskCreate`

subject, description, done_criterion (required), progress_label (optional)

done_criterion must be a falsifiable observation: what you expect to see AND what you would see if it is wrong. Example: "All 92 tests pass. If wrong: type errors in build or failures in task-store.test.ts."

`TaskList`

Lists all tasks. 👀 indicates pending sign-off.

`TaskGet`

Full task details including done_criterion, approval state, completion mode, review state, a one-line gate status such as ready for human sign-off via /lgtm 5 or blocked: automatic robot review failed: ..., and evidence-iteration history.

`TaskUpdate`

Update status (pending | in_progress | deleted), subject, description, done_criterion, dependencies. Cannot set completed -- use /lgtm.

`lgtm_ask`

The epistemic gate. Required fields:

Field	Description
`taskId`	Task to submit
`evidence`	Exact command run + output, commit hash, config/seeds, file paths. "I ran X and got Y" not "I wrote X".
`failure_likely`	Most likely way this is wrong despite evidence
`failure_sneaky`	Perverse/silent failure that looks like success superficially
`falsification_test`	What you ran and what you got, so both you and the human can sanity-check it. Why that result could not occur if a failure mode were real.
`verification_hints`	Where to look and what to check. These still force the agent to think, but weak hints are advisory rather than a hard block when the verbatim evidence already proves the claim. Core evidence still has to pass on its own.
`remaining_uncertainty`	What is NOT tested, deferred edge cases, known limitations
`commands`	Optional structured command records: `{ cmd, exit_code, stdout_path?, stderr_path? }`
`evidence_paths` / `falsification_paths`	Optional local artifact paths. Stored as absolute path + sha256 + byte size
`supersede_reason`	Optional reason when this replaces older evidence on the same task

After calling this, the task shows 👀 and is only completable via /lgtm <id>. Evidence is stored on the task so the human can review it hours later without scrolling back. Re-submitting evidence archives the prior package into superseded history instead of silently overwriting it.

The tool result includes a non-blocking self-check prompt asking whether the evidence directly addresses the done_criterion and whether a skeptical reviewer would find it convincing.

lgtm_ask always runs the robot-review stage immediately after storing evidence. A robot review that rejects the evidence clears pending_approval until the evidence is strengthened and reviewed again. Weak verification hints are advisory if the core verbatim evidence already proves the done criterion. A reviewer crash, auth failure, timeout, or malformed output is recorded as a warning and leaves human sign-off open.

`lgtm_supersede`

Explicitly retire the current evidence package without completing the task.

Use this when the claim changed or the prior evidence is stale. The tool archives the current evidence, current robot reviews, and reviewer-failure context into history with your reason, then closes the human gate until new evidence is submitted.

`robot_review_ask`

Attach a fresh-perspective robot review to a task.

Required fields:

Field	Description
`taskId`	Task to annotate
`reviewer`	Model/provider/family/class used for the review
`scope`	What the reviewer inspected
`observations`	Concrete observations only. No advice, verdicts, or editorial
`blind_spots`	What the reviewer did not inspect or could not verify
`accepted`	Overall accept/reject decision for whether the task is ready to advance
`evidence_complete`	Whether the supplied evidence actually covers the done criterion
`evidence_convincing`	Whether the supplied evidence would convince a skeptical reviewer
`missing_evidence`	Concrete missing checks or artifacts needed before human sign-off

Use this from a separate subagent or other model when possible. Reviews append as iterations; the latest one is what gates human sign-off. If stored LGTM evidence already exists, an accepted manual review reopens the human sign-off gate.

`robot_review_run`

Run the automatic robot reviewer against the current task evidence using the current session model.

Default reviewer stage:

pi --mode json -p --no-session --no-tools --no-extensions --model <current-session-model>

This appends a new robot-review iteration. The reviewer returns an explicit accepted boolean as well as detailed observations, blind spots, and missing evidence. If the latest robot review rejects the evidence, /lgtm is blocked until stronger evidence is submitted and reviewed again. If the reviewer process fails to run or returns malformed output, the failure is recorded but human sign-off stays open.

Commands

`/lgtm <id>`

Human-only sign-off. Shows stored evidence, falsification output, failure modes, review status, and remaining uncertainty in structured sections for review, then asks for confirmation. Without <id>, shows a list of pending-approval tasks.

`/tasks`

Interactive menu: view tasks, create task, clear completed/all.

Task lifecycle

pending -> in_progress -> (lgtm_ask)
                       -> current evidence iteration N
                       -> robot review iteration(s) on current evidence 🤖
                       -> pending_approval 👀   if latest robot review passes, or reviewer infra fails
                       -> reviewer_rejected
                       -> lgtm_supersede or newer lgtm_ask -> superseded history + fresh current evidence
                       -> (/lgtm) -> completed
                       -> deleted

Storage

Controlled by taskScope in .pi/tasks-config.json:

Mode	File	Behaviour
`memory`	none	In-memory, lost on session end
`session` (default)	`.pi/tasks/tasks-<sessionId>.json`	Per-session, survives resume
`project`	`.pi/tasks/tasks.json`	Shared across all sessions

Override via env:

PI_TASKS=off          # in-memory (CI)
PI_TASKS=sprint-1     # named shared list at ~/.pi/tasks/sprint-1.json
PI_TASKS=/abs/path    # explicit path
PI_TASKS_DEBUG=1      # trace to stderr

Architecture

src/
├── index.ts        # 8 tools + /tasks + /lgtm commands + widget + event handlers
├── review-badges.ts # Review badge helpers for tool/robot/human lanes
├── robot-review.ts # Robot review iteration storage + compatibility helpers
├── types.ts        # Task, TaskStatus types
├── task-store.ts   # File-backed store with CRUD, locking, complete() method
├── auto-clear.ts   # Turn-based auto-clearing of completed tasks
├── tasks-config.ts # Config persistence -> .pi/tasks-config.json
└── ui/
    └── task-widget.ts  # Widget with status icons, spinner, 👀 indicator

Development

npm install
npm run typecheck
npm test            # 92 tests
npm run build

License

MIT -- based on tintinweb/pi-tasks (MIT)