Public benchLIVE

What's on the bench.

5,040
Artifacts
26
Industries
18
Reads / week

All artifacts

5040
Industry
AGENT0

Cs Handoff Author

Conversation-handoff author. Compacts the current session into a markdown handoff for a fresh agent. Tailors content to next-session focus. Refuses to duplicate content from PRDs/plans/ADRs/issues/commits — references them by path or URL instead. Recommends specific skills for the next session.

software-engineering+2
0
SKILL0

Cs Handoff

/cs:handoff <next-session-focus> — Compact the current conversation into a handoff document for a fresh agent. Tailored to next-session focus (deploy/review/debug/design/test). Replaces PRD/ADR/issue/commit content with references. Recommends specific skills for the next session.

software-engineering+2
0
AGENT0

Cs Grill With Docs

Docs-anchored plan interrogator. Walks a plan's decision tree against the project's existing language (CONTEXT.md) and recorded decisions (docs/adr/). Pre-flights the glossary + ADR linters before asking the first question. Refuses to grill in a vacuum when documented language exists. Refuses to offer ADRs unless all 3 criteria are met (hard-to-reverse, surprising-without-context, real-trade-off).

software-engineering+1
0
SKILL0

Cs Grill With Docs

/cs:grill-with-docs <path-to-plan> — Start a docs-anchored grilling session. Pre-flights CONTEXT.md + docs/adr/ linters, then interrogates the plan one decision at a time, updating glossary + writing ADRs inline as they crystallise.

software-engineering+2
0
AGENT0

Cs Grill Master

Relentless plan-and-design interrogator. Walks decision trees one branch at a time, asks one question per turn with recommended answer + rationale, explores codebase before asking, tracks session state across turns. Refuses to bundle questions. Refuses to ask questions the codebase can answer.

software-engineering+2
0
SKILL0

Cs Grill Me

/cs:grill-me <path-to-plan> — Start a relentless interrogation of a plan or design. Walks decision tree one branch at a time. One question per turn with recommended answer. Explores codebase before asking.

product-management+2
0
AGENT0

Cs Claude Coach

Use proactively after any user message in a Claude.ai or Claude Code session where the user is learning to prompt better or has explicitly activated coaching. Default persona for the claude-coach skill. Watches each turn for missed power-user opportunities and surfaces at most one ⚡ tip when a tip would genuinely 10x the next attempt.

ai-prompt-engineering+2
0
SKILL0

Cs Claude Coach

Activate the claude-coach skill — personal Claude power-user coaching for the rest of the conversation.

ai-prompt-engineering+2
0
AGENT0

Cs Caveman Mode

Caveman-mode operator. Persistent ultra-compressed communication mode. Drops articles, filler, pleasantries, and hedging while preserving all technical substance. Auto-clarity exception for security warnings, irreversible actions, multi-step sequences, and clarification requests. Activated by user phrases ("caveman mode", "talk like caveman", "use caveman", "less tokens", "be brief") or /cs:caveman command.

software-engineering+2
0
SKILL0

Cs Caveman

/cs:caveman — Activate persistent caveman-mode. Ultra-compressed responses with technical substance preserved. Auto-clarity exception for warnings + destructive ops. Stays active until 'stop caveman' / 'normal mode'.

software-engineering+2
0
AGENT0

Experiment Runner

You are an autonomous experimenter. Your job is to optimize a target file by a measurable metric, one change at a time.

ai-prompt-engineering+2
0
AGENT0

Hub Coordinator

Coordinator for AgentHub multi-agent collaboration sessions. Dispatches N parallel subagents in isolated git worktrees via the Agent tool, monitors progress via the message board, evaluates results by metric command or LLM judge, and merges the winning branch. Acts as the main Claude Code session role for `/hub:*` commands.

software-engineering+2
0
AGENT0

Skill Extractor

Transforms a proven pattern or debugging solution into a standalone, portable skill package. Generates `SKILL.md` with proper frontmatter, reference docs, and examples that work in any project (no hardcoded paths or project-specific values). Spawned by `/si:extract` when a recurring solution should become reusable.

software-engineering+2
0
AGENT0

Test Debugger

Diagnoses flaky or failing Playwright tests using systematic taxonomy. Invoked by /pw:fix when a test needs deep analysis including running tests, reading traces, and identifying root causes.

software-engineering+2
0
AGENT0

Migration Planner

Analyzes Cypress or Selenium test suites and creates a file-by-file migration plan. Invoked by /pw:migrate before conversion starts.

software-engineering+1
0
SKILL0

Snowflake Development

Use when writing Snowflake SQL, building data pipelines with Dynamic Tables or Streams/Tasks, using Cortex AI functions, creating Cortex Agents, writing Snowpark Python, configuring dbt for Snowflake, or troubleshooting Snowflake errors.

data-science-ml+2
0
SKILL0

Tech Stack Evaluator

Technology stack evaluation and comparison with TCO analysis, security assessment, and ecosystem health scoring. Use when comparing frameworks, evaluating technology stacks, calculating total cost of ownership, assessing migration paths, or analyzing ecosystem viability.

software-engineering+2
0
SKILL0

TDD Guide

Test-driven development skill for writing unit tests, generating test fixtures and mocks, analyzing coverage gaps, and guiding red-green-refactor workflows across Jest, Pytest, JUnit, Vitest, and Mocha. Use when the user asks to write tests, improve test coverage, practice TDD, generate mocks or stubs, or mentions testing frameworks like Jest, pytest, or JUnit.

software-engineering+1
0
SKILL0

Stripe Integration Expert

Production-grade Stripe integrations: subscriptions with trials and proration, one-time payments, usage-based billing, checkout sessions, idempotent webhook handlers, customer portal, and invoicing. Covers Next.js, Express, and Django patterns. Use when integrating Stripe for the first time, debugging webhook reliability issues, migrating from a different payment provider, or adding usage-based billing to an existing subscription product.

software-engineering+2
0
SKILL0

Senior Security

Use when the user asks for STRIDE threat modeling, DREAD risk scoring, data-flow-diagram threat analysis, or a quick secret scan — or when a security request needs routing to the right specialist skill (pen-testing, incident response, cloud posture, red team, AI security, threat hunting, secure code review). This skill owns threat modeling; everything else routes to a sibling.

cybersecurity-soc+1
0
SKILL0

Senior Secops

Senior SecOps engineer skill for application security, vulnerability management, compliance verification, and secure development practices. Runs SAST/DAST scans, generates CVE remediation plans, checks dependency vulnerabilities, creates security policies, enforces secure coding patterns, and automates compliance checks against SOC2, PCI-DSS, HIPAA, and GDPR. Use when conducting a security review or audit, responding to a CVE or security incident, hardening infrastructure, implementing authentication or secrets management, running penetration test prep, checking OWASP Top 10 exposure, or enforcing security controls in CI/CD pipelines.

cybersecurity-soc+2
0
SKILL0

Senior QA

Generates unit tests, integration tests, and E2E tests for React/Next.js applications. Scans components to create Jest + React Testing Library test stubs, analyzes Istanbul/LCOV coverage reports to surface gaps, scaffolds Playwright test files from Next.js routes, mocks API calls with MSW, creates test fixtures, and configures test runners. Use when the user asks to "generate tests", "write unit tests", "analyze test coverage", "scaffold E2E tests", "set up Playwright", "configure Jest", "implement testing patterns", or "improve test quality".

software-engineering+1
0
SKILL0

Senior Prompt Engineer

Use when the user asks to optimize prompts, design prompt templates, evaluate LLM outputs with an eval set, measure RAG retrieval quality, validate agent/tool configurations, analyze token usage, or design structured-output contracts. Covers eval-driven prompt iteration, RAG metrics (relevance, faithfulness, coverage), agent workflow validation, and token/cost budgeting — all model-agnostic, with three stdlib Python tools.

ai-prompt-engineering+2
0
SKILL0

Senior ML Engineer

ML engineering skill for productionizing models, building MLOps pipelines, and integrating LLMs. Covers model deployment, feature stores, drift monitoring, RAG systems, and cost optimization. Use when the user asks about deploying ML models to production, setting up MLOps infrastructure (MLflow, Kubeflow, Kubernetes, Docker), monitoring model performance or drift, building RAG pipelines, or integrating LLM APIs with retry logic and cost controls. Focused on production and operational concerns rather than model research or initial training.

data-science-ml+2
0

Want your own bench?

Free for crews of 5. Connect your team in minutes.

Sign up free