Commit Graph

46 Commits

Author SHA1 Message Date
wassname 9423d299a3 Simplify automatic robot review 2026-06-13 23:07:40 +08:00
wassname 8b7159cc6d clean up lgtm metadata helpers 2026-06-07 19:06:26 +08:00
wassname 25116e399a add lgtm evidence history and artifact metadata 2026-06-07 19:03:01 +08:00
wassname 023a2c2263 fix lgtm review gating and evidence display 2026-06-07 18:55:05 +08:00
wassname dbe887f0c2 feat: human override for /lgtm + visible [STATE] tag
- /lgtm <id> and /lgtm * no longer hard-error when the agent skipped
  lgtm_ask; the human is the final gate, so they get a confirm dialog
  with explicit override copy instead of an error.
- /lgtm * now spans every open task (READY + ACTIVE + PENDING) and
  shows a grouped preview before signing off.
- Each task row (widget + TaskList) is prefixed with a coloured
  [READY]/[ACTIVE]/[PENDING]/[DONE] tag so signoff-readiness is
  legible at a glance instead of decoded from emoji pipeline + colour.
2026-05-02 08:06:45 +08:00
wassname 5b800653a3 feat: two-tier task model — trivial tasks self-complete, lgtm gates significant claims
Reviewer feedback: the LGTM extension's epistemic core is good but UX is too
ceremonial — every task forced through lgtm_ask + /lgtm even bookkeeping like
"monitor pueue 30". Two-tier split:

- Tasks: agent-managed. TaskUpdate(status=completed) now allowed when no lgtm
  evidence is stored. Trivial subtasks lead up to verification without ceremony.
- LGTMs: significant claims. lgtm_ask still triggers robot review; once evidence
  is stored, completion is locked to /lgtm so the gate can't be bypassed.

Other UX:
- TaskList output grouped: Active / Awaiting sign-off / Pending / Completed.
- New getDisplayStatus(task) derives awaiting_signoff from pending_approval.
- Widget header shows N awaiting sign-off count.
- /lgtm accepts multiple ids: /lgtm 1 2 3 (also #1, commas).
- lgtm_ask field descriptions encourage one short sentence per field — keep
  thinking discipline, drop verbosity.
- SYSTEM_REMINDER nudges progress updates and cleanup of completed/irrelevant
  tasks, not just lgtm_ask.

Also includes pending rubric extension on RobotReviewRecord.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-25 18:18:48 +08:00
wassname d908f6f617 fix: harden pi robot review harness 2026-04-17 12:02:35 +08:00
wassname d52596aad1 feat: run robot review via pi harness 2026-04-17 11:10:13 +08:00
wassname 0ba0173615 feat: gate lgtm on robot review acceptance 2026-04-17 08:31:07 +08:00
wassname 3c7d2f5140 feat: iterate and auto-run robot reviews 2026-04-17 08:20:30 +08:00
wassname 423b34fc55 feat: add robot review lane 2026-04-17 08:14:15 +08:00
wassname 2773971e32 reduce MAX_VISIBLE_TASKS from 10 to 5 (compact 2-line layout) 2026-04-17 07:11:46 +08:00
wassname 52eba791ca screenshot 2026-04-17 06:47:19 +08:00
wassname 8ba835df47 lgtm?'s -> lgtm's in widget header
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 06:20:29 +08:00
wassname ee57eb2eae Use test: prefix for done_criterion in widget
Cleaner than lgtm? (redundant with header). Reads naturally as pass/fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 06:16:47 +08:00
wassname d2f4506183 Rebrand widget: lgtm?'s count, lgtm? criterion prefix
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 06:15:49 +08:00
wassname dc86b18a44 Add lgtm: prefix to done_criterion in widget
Gives context that this is what the human is signing off on.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 06:14:30 +08:00
wassname 8e78a3ec92 Show done_criterion in widget for non-completed tasks
Human needs to see the criterion, not just the subject.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 06:14:05 +08:00
wassname e86c19b11a fix: falsification_test is agent-reported results, not a human procedure
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 05:59:44 +08:00
wassname f29d3a45be fix: FM2 asks for sneaky/perverse failure, not undetectable one
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 05:57:42 +08:00
wassname 206179fb23 fix: broaden falsification_test from command-only to any concrete check
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 05:56:29 +08:00
wassname 8f2d8887be refactor: replace evidence_vs_failures with falsification_test
- falsification_test: concrete runnable command + expected output if
  claim true vs false + why test can't accidentally pass under failure
- failure_mode_2 now explicitly asks for subtle/silent/null-hypothesis
  failure, not just "second most likely"
- nudges toward: null hypothesis, silent fails, env mismatch

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 05:55:43 +08:00
wassname 8ea225d119 feat: pi-lgtm -- LGTM sign-off layer on pi-tasks
- Strip: TaskExecute, TaskOutput, TaskStop, process-tracker, subagent RPC, settings menu
- Add done_criterion (required, falsifiable) to TaskCreate
- Block status=completed in TaskUpdate -- must use /lgtm
- Add lgtm_ask tool: evidence + 2 failure modes + evidence_vs_failures + remaining_uncertainty
- Add /lgtm command: human-only sign-off with stored evidence review
- Persist all lgtm_ask fields in task.metadata for async review
- Widget shows 👀 for pending_approval tasks
- Update README, package.json author

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-17 05:41:18 +08:00
tintinweb 46cca7a734 widget add taskid - fixes #3 2026-03-24 15:14:07 +01:00
tintinweb 72af9e1468 Merge pull request #5 from tintinweb/feat/remove-autocomplete
feat: allow to configure completed task behavior (keep, remove on task complete, remove on list complete) 4 turns lag.
2026-03-22 22:15:07 +01:00
tintinweb e21350c794 fix /new /resume 2026-03-22 22:09:14 +01:00
tintinweb 30c78e4eb8 add build workflow 2026-03-22 21:57:14 +01:00
tintinweb 171cb63a4d remove autocomplete
honor /new
2026-03-22 21:51:49 +01:00
tintinweb 3db97c2c4c linter fixes 2026-03-22 20:29:00 +01:00
tintinweb e1726d7c2a v0.4.0 2026-03-22 20:22:18 +01:00
tintinweb 312f8323f2 Merge pull request #4 from tintinweb/feat/stop-subagents
feat: subagents-api (eventbus) subagent-stop; switch to new API envs; fix bugs
2026-03-22 20:09:16 +01:00
tintinweb 2de0695968 fix TaskExecute UX: debug logging, agent ID resolution, TaskGet consistency
- Add PI_TASKS_DEBUG=1 env flag to trace RPC communication to stderr
  - TaskOutput/TaskStop now accept agent IDs (resolve via agentTaskMap)
  - TaskGet filters completed blockers (consistent with TaskList)
  - TaskGet shows non-empty metadata
  - Soften TaskExecute description to not deter agents from using it
  - TaskExecute success message guides agents to use TaskOutput
  - Add promptGuidelines to prevent duplicate agent spawns
  - Update changelog
2026-03-22 18:37:09 +01:00
tintinweb f1a1ba0944 standardize RPC envelope, add rpcCall helper, version mismatch detection 2026-03-22 16:37:26 +01:00
tintinweb 97130e1e35 add biome
add stop subagents
2026-03-22 14:51:07 +01:00
tintinweb ccddf93590 v0.3.3 2026-03-17 18:46:39 +01:00
tintinweb c6769fdab1 v0.3.2 2026-03-17 16:52:07 +01:00
tintinweb 578b1c9a6b fix nudge/sysprompt cc like
description in tools
fix widget refresh flickering
2026-03-16 21:54:02 +01:00
tintinweb b200c7e5b0 v0.3.0 2026-03-14 22:50:23 +01:00
tintinweb 1e7d0d7b57 v0.3.0 - @tintinweb/pi-tasks ♥️ @tintinweb/pi-subagents 2026-03-14 22:28:10 +01:00
tintinweb 569ef337bc v0.2.0 2026-03-13 21:25:50 +01:00
tintinweb c7f1d0e4bb Merge pull request #1 from tintinweb/eventbus_task_to_subagent_conversion
Eventbus 🥳 + task to subagent conversion
2026-03-13 21:20:12 +01:00
tintinweb 73971210e6 v0.2.0 2026-03-12 02:46:52 +01:00
tintinweb 1dd02c8380 eventbus + task -> subagent conversion 2026-03-12 02:38:21 +01:00
tintinweb 59262a7537 screenshot 2026-03-12 01:37:32 +01:00
tintinweb 1880ec204b Update README with user attachments link
Added a link to user attachments in the README.
2026-03-12 01:35:23 +01:00
tintinweb 86d2b64779 v0.1.0 2026-03-12 00:47:54 +01:00