AI Product Canvas

Structure AI and ML product decisions with the rigour of any product decision. Use when building AI-powered features, evaluating LLM integrations, designing AI products, or assessing AI readiness. Produces a complete AI product canvas covering problem definition, model approach, data requirements, evaluation framework, UX design, responsible AI checklist, and launch monitoring plan.

Published by @Mohit Aggarwal·0 agent reads / 30d·0 saves·

AI Product Canvas Skill

Define AI products with the same rigour as any product decision — but with additional layers for data, model, evaluation, and responsible AI. This canvas prevents the most common AI product failure: building a technically impressive feature that doesn't solve a real problem.

AI Product Anti-Patterns to Check First

Before building, flag if any of these apply:

  • ❌ "We should add AI to [existing feature]" — with no user problem defined
  • ❌ Accuracy target undefined before build begins
  • ❌ No plan for what happens when the model is wrong
  • ❌ User-facing AI output with no human review or fallback
  • ❌ Training data not audited for bias or quality
  • ❌ No evaluation metric — "we'll know it when we see it"

AI Product Canvas Output Format

AI Product Canvas — [Feature Name] — [Date]

PM Owner: [Name] ML/AI Lead: [Name] Status: Discovery / Design / Build / Evaluation / Live


1. Problem Definition

User problem being solved:

[What specific situation is the user in? What job are they trying to get done?]

Why AI?

[What makes this problem require AI vs a deterministic solution? If the answer is "because we can," stop here.]

Success for the user looks like:

[What outcome does the user experience when the AI feature is working well?]


2. AI Approach

Task type:

  • Classification
  • Generation (text, image, code)
  • Summarisation / extraction
  • Recommendation
  • Search / retrieval
  • Prediction / forecasting
  • Conversation / agent

Model approach:

  • LLM API (GPT-4, Claude, Gemini, etc.) — specify: [Model name + version]
  • Fine-tuned model on own data
  • Custom model trained from scratch
  • RAG (retrieval-augmented generation)
  • Embedding + vector search

Rationale for chosen approach: [Why this, not alternatives]


3. Data Requirements
Data TypeSourceVolumeQuality StatusBias Risk
[Training data][Where it comes from][Volume][Audit status]H/M/L
[Evaluation data][Where it comes from][Volume][Audit status]H/M/L

Data gaps: [What's missing and plan to get it] Privacy considerations: [Any PII in training or inference data] Data ownership: [Do we own this data? Can we use it for training?]


4. Evaluation Framework

Primary metric: [The number that defines success — accuracy, F1, BLEU, user rating, task completion rate] Minimum acceptable threshold: [Below X, the feature does not ship] Human evaluation plan: [How will humans review model outputs? Sampling rate? Review panel?]

Evaluation TypeMethodCadenceOwner
Offline (pre-launch)[Test set, benchmark]Pre-launchML Lead
Online (post-launch)[A/B test, user feedback]WeeklyPM + ML
Adversarial[Red-team, edge cases]Pre-launchSafety reviewer

5. User Experience Design

How is AI output presented?

  • Direct output shown to user (high trust required)
  • AI-assisted with user confirmation
  • Suggestion user can accept/reject
  • Background action with audit log

Confidence and uncertainty handling:

  • What happens when confidence is low? [Show alternative, ask for clarification, fallback to manual]
  • How is uncertainty communicated to the user? [UI pattern]

Fallback plan:

  • If the model fails or returns an error: [Specific fallback behaviour]
  • If accuracy degrades below threshold: [Kill switch or graceful degradation plan]

6. Responsible AI Checklist
  • Bias audit completed on training data
  • Demographic fairness evaluated (does performance differ by user group?)
  • Hallucination / confabulation risk assessed and mitigated
  • User can see and correct AI output
  • Opt-out mechanism exists (can user disable the AI feature?)
  • Output provenance visible when relevant (does user know AI generated this?)
  • PII not used in ways user didn't consent to
  • Regulatory review completed (GDPR, AI Act, sector-specific)
  • Model cards / documentation completed

7. Launch & Monitoring Plan

Rollout: [% of users, with staged expansion criteria] Monitoring metrics:

  • Model performance: [Metric + alert threshold]
  • User engagement with AI output: [Acceptance rate, override rate, feedback score]
  • Error rate: [% of failed inferences]
  • Latency: [P95 target]

Model refresh cadence: [How often is the model retrained or updated?] Drift detection: [How will you know when model performance degrades in production?]


Guidelines

  • Never skip the "Why AI?" section — it's the most important question in AI product development
  • The fallback UX is not optional — what happens when AI fails defines your product's trustworthiness
  • Responsible AI checklist must be completed before launch, not after
  • Include latency in success metrics — a 5-second AI response is often worse than no AI at all
  • Recommend starting with a human-in-the-loop design and automating only when accuracy is proven

Required Inputs

Ask the user for these if not provided:

  • Feature or product description (what the AI is intended to do)
  • User problem (what problem the AI is solving for users)
  • Available data (what training/inference data exists)
  • ML/AI lead (who owns the technical implementation)

Anti-Patterns

  • Do not skip the "Why AI?" question — if the answer is "we want to use AI," stop and reframe around the user problem first
  • Do not launch with an undefined accuracy threshold — "good enough" is not a threshold; set a number before build begins
  • Do not design the UX to hide AI-generated output as if it were system truth — users need to know when AI is involved so they can override it
  • Do not defer the Responsible AI checklist to post-launch — bias and privacy issues are far harder to fix in production than in design
  • Do not treat model latency as a post-launch optimisation — a 6-second AI response that replaces a 1-second rule-based response is a regression, not a feature

Quality Checks

  • "Why AI?" is answered clearly (not "because we can")
  • Minimum acceptable accuracy threshold is defined before build begins
  • Fallback UX is specified for model failures or low-confidence outputs
  • Responsible AI checklist is completed (not deferred to post-launch)
  • Monitoring plan includes both model performance and user engagement metrics

Bundled with this artifact

1 file

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Azure AI Vision Imageanalysis Py

Azure AI Vision Image Analysis SDK for captions, tags, objects, OCR, people detection, and smart cropping. Use for computer vision and image understanding tasks.

software-engineering+2
0
SKILL0

Zustand Store Ts

Create Zustand stores following established patterns with proper TypeScript types and middleware.

ai-prompt-engineering+3
0
SKILL0

Zoom Automation

Automate Zoom meeting creation, management, recordings, webinars, and participant tracking via Rube MCP (Composio). Always search tools first for current schemas.

ai-prompt-engineering+3
0