Public benchLIVE

What's on the bench.

5,040
Artifacts
26
Industries
1
Reads / week

All artifacts

5040
Industry
SKILL0

Property Based Testing

Provides guidance for property-based testing across multiple languages and smart contracts. Use when writing tests, reviewing code with serialization/validation/parsing patterns, designing features, or when property-based testing would provide stronger coverage than example-based tests.

software-engineering
0
SKILL0

Mutation Testing

Configures mewt or muton mutation testing campaigns — scopes targets, tunes timeouts, and optimizes long-running runs. Use when the user mentions mewt, muton, mutation testing, or wants to configure or optimize a mutation testing campaign.

software-engineering+1
0
SKILL0

Modern Python

Configures Python projects with modern tooling (uv, ruff, ty). Use when creating projects, writing standalone scripts, or migrating from pip/Poetry/mypy/black.

software-engineering+1
0
SKILL0

Insecure Defaults

Detects fail-open insecure defaults (hardcoded secrets, weak auth, permissive security) that allow apps to run insecurely in production. Use when auditing security, reviewing config management, or analyzing environment variable handling.

cybersecurity-soc+1
0
SKILL0

Git Cleanup

Safely analyzes and cleans up local git branches and worktrees by categorizing them as merged, squash-merged, superseded, or active work.

software-engineering
0
SKILL0

Gh CLI

Enforces authenticated gh CLI workflows over unauthenticated curl/WebFetch patterns. Use when working with GitHub URLs, API access, pull requests, or issues.

software-engineering
0
AGENT0

Poc Builder

Creates proof-of-concept exploits (pseudocode, executable, and unit tests) demonstrating a verified vulnerability, plus negative PoCs showing exploit preconditions. Spawned by fp-check during Phase 4 verification.

cybersecurity-soc
0
AGENT0

Exploitability Verifier

Verifies whether a suspected vulnerability is actually exploitable by proving attacker control, mathematical bounds, and race condition feasibility. Spawned by fp-check during Phase 2 verification.

cybersecurity-soc+1
0
AGENT0

Data Flow Analyzer

Analyzes data flow from source to vulnerability sink, mapping trust boundaries, API contracts, environment protections, and cross-references. Spawned by fp-check during Phase 1 verification.

cybersecurity-soc+1
0
SKILL0

Fp Check

Systematically verifies suspected security bugs to eliminate false positives. Produces TRUE POSITIVE or FALSE POSITIVE verdicts with documented evidence for each bug.

cybersecurity-soc+1
0
SKILL0

Firebase Apk Scanner

Scans Android APKs for Firebase security misconfigurations including open databases, storage buckets, authentication issues, and exposed cloud functions. Use when analyzing APK files for Firebase vulnerabilities, performing mobile app security audits, or testing Firebase endpoint security. For authorized security research only.

cybersecurity-soc+1
0
SKILL0

Entry Point Analyzer

Analyzes smart contract codebases to identify state-changing entry points for security auditing. Detects externally callable functions that modify state, categorizes them by access level (public, admin, role-restricted, contract-only), and generates structured audit reports. Excludes view/pure/read-only functions. Use when auditing smart contracts (Solidity, Vyper, Solana/Rust, Move, TON, CosmWasm) or when asked to find entry points, audit flows, external functions, access control patterns, or privileged operations.

cybersecurity-soc+1
0
SKILL0

Dwarf Expert

Provides expertise for analyzing DWARF debug files and understanding the DWARF debug format/standard (v3-v5). Triggers when understanding DWARF information, interacting with DWARF files, answering DWARF-related questions, or working with code that parses DWARF data.

software-engineering+1
0
AGENT0

Dimension Discoverer

Discovers dimensional vocabulary for codebases by analyzing naming conventions and protocol patterns

software-engineering+1
0
SKILL0

Dimensional Analysis

Annotates codebases with dimensional analysis comments documenting units, dimensions, and decimal scaling. Use when someone asks to annotate units in a codebase, perform a dimensional analysis, or find vulnerabilities in a DeFi protocol, offchain code, or other blockchain-related codebase with arithmetic. Prevents dimensional mismatches and catches formula bugs early.

cybersecurity-soc+1
0
AGENT0

Adversarial Modeler

Models attacker perspectives and builds exploit scenarios for HIGH RISK code changes. Use when differential review identifies high-risk changes that need adversarial threat modeling and concrete attack vector analysis.

cybersecurity-soc+1
0
SKILL0

Differential Review

Performs security-focused differential review of code changes (PRs, commits, diffs). Adapts analysis depth to codebase size, uses git history for context, calculates blast radius, checks test coverage, and generates comprehensive markdown reports. Automatically detects and prevents security regressions.

cybersecurity-soc+1
0
SKILL0

Devcontainer Setup

Creates devcontainers with Claude Code, language-specific tooling (Python/Node/Rust/Go), and persistent volumes. Use when adding devcontainer support to a project, setting up isolated development environments, or configuring sandboxed Claude Code workspaces.

software-engineering+1
0
SKILL0

Debug Buttercup

Debugs the Buttercup CRS (Cyber Reasoning System) running on Kubernetes. Use when diagnosing pod crashes, restart loops, Redis failures, resource pressure, disk saturation, DinD issues, or any service misbehavior in the crs namespace. Covers triage, log analysis, queue inspection, and common failure patterns for: redis, fuzzer-bot, coverage-bot, seed-gen, patcher, build-bot, scheduler, task-server, task-downloader, program-model, litellm, dind, tracer-bot, merger-bot, competition-api, pov-reproducer, scratch-cleaner, registry-cache, image-preloader, ui.

cybersecurity-soc+1
0
SKILL0

Constant Time Analysis

Detects timing side-channel vulnerabilities in cryptographic code. Use when implementing or reviewing crypto code, encountering division on secrets, secret-dependent branches, or constant-time programming questions in C, C++, Go, Rust, Swift, Java, Kotlin, C#, PHP, JavaScript, TypeScript, Python, or Ruby.

cybersecurity-soc+1
0
SKILL0

Chrome MCP Troubleshooting

Diagnose and fix Claude in Chrome MCP extension connectivity issues. Use when mcp__claude-in-chrome__* tools fail, return "Browser extension is not connected", or behave erratically.

software-engineering
0
AGENT0

C Review Worker

Runs one assigned c-review cluster task and writes finding files to the run's output directory. Spawned by the c-review skill orchestrator only.

cybersecurity-soc+1
0
AGENT0

C Review Fp Judge

Second-stage judge in the c-review pipeline. Runs after dedup-judge on merged primaries only. Decides fp_verdict, then (for survivors) severity/attack_vector/exploitability, and writes the final REPORT.md + REPORT.sarif. Spawned by the c-review skill orchestrator only.

cybersecurity-soc+1
0
AGENT0

C Review Dedup Judge

Deduplication judge for the c-review pipeline. Merges duplicate findings deterministically by exact location, then narrowly reviews same-function same-class candidates. Spawned by the c-review skill orchestrator only.

software-engineering+1
0

Want your own bench?

Free for crews of 5. Connect your team in minutes.

Sign up free