Dos Verify Done Claims

Before accepting an agent's 'done / shipped / fixed' claim, verify it against ground truth (git ancestry + the commit's own diff) using the DOS kernel's `dos verify` and `dos commit-audit` — never the agent's own narration.

Published by @sickn33 and contributors·0 agent reads / 30d·0 saves·

Verify done-claims against ground truth, not the agent's word

Overview

When an AI agent says "done", "shipped", or "fixed", that is a claim, not a fact — and a claim the agent checks by re-reading its own work is consistency, not grounding. This skill replaces that self-report with a verdict from a witness the agent did not author: it shells the DOS kernel (dos verify, dos commit-audit) to confirm the claimed effect from git ancestry and the commit's actual diff. DOS is deterministic — no API key, no LLM. The verdict is git-only and offline as used here; the one exception is dos verify in a workspace that wires a CI oracle, which --no-ci suppresses (see Security & Safety Notes).

This skill adapts the DOS reference "witness-claim" pattern (anthony-chaudhary/dos-kernel) into a host-agnostic screenplay.

When to Use This Skill

  • Use when an agent reports a task/phase/feature as complete and you want that "done" confirmed from evidence before building on it.
  • Use right after a commit, to confirm the commit's message matches its diff (catch a fix: that only touched a README, or a "tests pass" that deleted the assertions).
  • Use when folding many sub-agents' results — verify each claimed effect instead of trusting the return string.
  • Do not use it to judge whether code is correct — that is what the test suite is for. This skill checks did-the-claimed-thing-actually-ship.

How It Works

Step 1: Install the kernel (once)

pip install dos-kernel        # provides the `dos` CLI; deterministic, no key

Step 2: Audit the latest commit's claim vs its diff

A commit subject is forgeable (whoever wrote the message authored it); the files it touched are not (git did). dos commit-audit grades the subject against the actual diff:

dos commit-audit --workspace . HEAD --json

commit-audit --json prints a JSON array of audited commits (one element even for a single HEAD), so read verdict from the first element — e.g. dos commit-audit --workspace . HEAD --json | jq -r '.[0].verdict'. (Without --json the same verdict prints as a one-line text row: · OK …, ⚑ UNWITNESSED …, or · abstain ….) The verdicts are: OK (the diff backs the claim's kind), CLAIM_UNWITNESSED (the subject's claim is not evidenced by the diff — treat the "done" as unproven), or ABSTAIN. This judges the kind of change, never correctness — run the tests for that.

Step 3: Verify a named phase actually shipped

If the agent claims a specific plan/phase landed, confirm it from git history rather than the transcript:

dos verify --workspace . PLAN PHASE --json --no-ci

--no-ci keeps the verdict git-only (see the Security note below). With --json you get the shipped and source fields. (The default text form prints SHIPPED PLAN PHASE (via grep) or NOT_SHIPPED PLAN PHASE (via none) — the same verdict, and the process exit code is non-zero when not shipped.)

Grade shipped: true by the source, because git fallback grades itself by forgeability — and forgeable evidence is exactly what this skill exists to distrust:

  • registry or grep-artifactnon-forgeable (a registry row, or an artefact/diff rung). This closes the claim.
  • grep-subject (or bare grep) — forgeable: a commit subject or body carried the phase token, which an agent can write without doing the work (even on an empty commit). Treat this as shipped-per-the-subject, not confirmed — corroborate it (run dos commit-audit on that commit, below) before you close.
  • none — no positive evidence; accept as "not shipped", not as a tool failure.

Step 4: Fold only confirmed effects

Accept the agent's "done" only when Step 2/3 corroborate it. If CLAIM_UNWITNESSED or shipped: false, the work is not done regardless of how confidently the agent narrated it — send it back.

Examples

Example 1: gate an agent's "I fixed the bug" claim

# The agent committed and said it's fixed. Check the diff backs the claim.
# commit-audit --json returns an array, so read the first element's verdict:
dos commit-audit --workspace . HEAD --json | jq -r '.[0].verdict'
# OK                -> the change is of the claimed kind; now run the tests
# CLAIM_UNWITNESSED -> the commit doesn't do what it says; reject

Example 2: confirm a feature phase shipped before closing a ticket

dos verify --workspace . AUTH AUTH2 --json --no-ci
# shipped: true, source: registry|grep-artifact -> non-forgeable; safe to close
# shipped: true, source: grep-subject|grep       -> forgeable subject/body match;
#   shipped-per-the-subject only -> corroborate with commit-audit before closing
# shipped: false, source: none -> no evidence; keep the ticket open

Best Practices

  • ✅ Run dos commit-audit HEAD immediately after every agent commit.
  • ✅ Treat source: none / CLAIM_UNWITNESSED as "not done", not as a tool error.
  • ✅ Close a claim on a non-forgeable source (registry, grep-artifact). Treat grep-subject / bare grep as forgeable (an agent can write the subject text) — corroborate before closing.
  • ✅ Keep the test suite as the separate correctness gate — this skill checks shipping, not correctness.
  • ❌ Don't accept a "done" because the agent's prose was confident.
  • ❌ Don't use this to replace code review or testing.

Limitations

  • This skill does not replace environment-specific validation, testing, or expert review.
  • It checks whether a claimed change shipped / matches its diff — not whether the code is correct.
  • dos verify reads git history; in a repo with no commits there is nothing to witness (it will honestly report source: none).
  • Stop and ask for clarification if required inputs (a git repo, the dos CLI) are missing.

Security & Safety Notes

  • This skill runs shell commands: pip install dos-kernel and the read-only dos verbs (dos commit-audit, dos verify). These verbs never mutate the repo or push. dos commit-audit only reads git history and the working tree (no network). dos verify is also git-only unless the workspace has wired a CI oracle ([verify] non_git_oracle in its dos.toml), in which case it may shell a network check (e.g. gh api) for the verdict — pass --no-ci (as the examples above do) to force the git-only path and guarantee no network.
  • pip install dos-kernel installs from PyPI. The distribution name is dos-kernel (the bare dos on PyPI is an unrelated package — do not install it). Pin a version in locked environments.
  • Run in the repository you intend to adjudicate; the --workspace . argument scopes every verdict to that repo.

Common Pitfalls

  • Problem: dos verify returns source: none and it looks like a failure. Solution: That is the honest "no evidence" verdict — it means the phase has no ship commit, so the claim is unproven. Re-stamp the real commit or keep the task open.
  • Problem: Installing the wrong package. Solution: The PyPI name is dos-kernel, not dos.

Related Skills

  • The upstream DOS reference screenplays (dos-witness-claim, dos-goal-gate) in anthony-chaudhary/dos-kernel cover the multi-agent fan-out and self-stopping-agent variants of this same witness discipline.

Bundled with this artifact

2 files

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Zustand Store Ts

Create Zustand stores following established patterns with proper TypeScript types and middleware.

ai-prompt-engineering+3
0
SKILL0

Zoom Automation

Automate Zoom meeting creation, management, recordings, webinars, and participant tracking via Rube MCP (Composio). Always search tools first for current schemas.

ai-prompt-engineering+3
0
SKILL0

Zoho Crm Automation

Automate Zoho CRM tasks via Rube MCP (Composio): create/update records, search contacts, manage leads, and convert leads. Always search tools first for current schemas.

ai-prompt-engineering+3
0