Incident Postmortem

Write a structured incident postmortem or post-incident review. Use when asked to write a postmortem, incident report, P1/P2 review, outage report, or RCA (root cause analysis). Generates a blameless postmortem with timeline, root cause, contributing factors, impact summary, and action items.

Published by @Mohit Aggarwal·0 agent reads / 30d·0 saves·

Incident Postmortem Skill

This skill produces a complete, blameless incident postmortem document following industry-standard format. Output enforces blameless framing throughout — system gaps over individual failures — and drives toward specific, closeable action items rather than vague process commitments.

Required Inputs

Ask the user for these if not provided:

  • Incident title / ID
  • Severity (P1 / P2 / P3 or SEV1 / SEV2 / SEV3)
  • Date and duration of the incident
  • What happened (rough notes are fine — the skill will structure them)
  • Services or systems affected
  • Customer impact (how many users, what was degraded)
  • How it was detected
  • How it was resolved
  • Initial thoughts on root cause
  • Action items already identified (optional)
  • Responders (who was on-call or responded — names or roles; used for the timeline, not for blame)
  • Customer or external communications sent (optional — any status page updates, emails, or support messages with timestamps)

Output Format


Incident Postmortem: [Incident Title]

Incident ID: [ID] Severity: [P1/P2/P3] Date: [Date] Duration: [Start time → Resolution time — total duration] Status: [Resolved / Monitoring / Ongoing] Author: [Leave blank for user to fill] Last updated: [Date]


Executive Summary

[3–5 sentences. Describe what happened, who was affected, and what was done to resolve it. Written for a non-technical stakeholder. No jargon. No blame.]


Impact

DimensionDetails
Users affected[Number or percentage]
Services degraded[List affected services]
Business impact[Revenue, SLA breach, support tickets, etc. if known]
Duration[Total time from first detection to full resolution]

Timeline

List events in chronological order. Each entry: [HH:MM UTC] — [What happened. Who did what. What changed.]

Rules for timeline entries:

  • Use passive or system-focused language — avoid "X made a mistake"
  • Include: first symptom, detection, escalation, hypothesis tested, fix applied, confirmation of resolution
  • Note time between key events (e.g. "22 minutes between detection and escalation")

Root Cause

Primary root cause: [One clear sentence. Technical but plain. "A misconfigured deployment config caused..."]

Contributing factors:

  • [Factor 1 — e.g. lack of canary deployment meant change hit 100% of traffic immediately]
  • [Factor 2 — e.g. alert threshold was set too high to catch the initial degradation]
  • [Factor 3 — add as many as are relevant]

Why did our existing safeguards not prevent this? [Honest paragraph explaining why monitoring, tests, or processes didn't catch this earlier. This is where blameless analysis matters most — focus on system gaps, not individual failures.]


Detection

  • How was it first detected? [Customer report / automated alert / internal monitoring / manual observation]
  • Time from incident start to detection: [X minutes]
  • Should we have detected this faster? [Yes / No — and why]

Resolution

What fixed it? [Clear description of the actual fix — one paragraph] Why did this work? [Brief technical explanation] Was there a temporary mitigation before full resolution? [Yes/No — describe if yes]


Action Items

#ActionOwnerDue DatePriority
1[Specific, testable action][Team or person][Date]P1/P2/P3

Rules for action items:

  • Each action must be specific enough to close as "done" or "not done" — no vague items like "improve monitoring"
  • Distinguish between: Prevent recurrence (fix the root cause), Improve detection (catch it faster next time), Improve response (resolve it faster next time)
  • Assign a real owner — not "team" or "TBD" if avoidable
  • Flag P1 actions as items that block the incident from being marked fully closed

What Went Well

[3–5 honest observations about the response. Include: fast collaboration, good runbooks used, effective escalation, clear communication. This section builds team confidence and reinforces good habits.]


Lessons Learned

[3–5 key insights from this incident that are worth sharing beyond this team. Write these as transferable lessons — e.g. "Our runbook for database failover didn't account for read-replica lag. All runbooks involving database failover should be reviewed."]


Communication Log

[Optional — list external communications sent: status page updates, customer emails, support responses. Include timestamps.]


Quality Checks

  • Timeline has no blame-focused language
  • Root cause is specific (not "human error")
  • Root cause answers "why did this happen?" not just "what happened?" — it names a system or process gap, not a symptom
  • Contributing factors explain the systemic gaps
  • Every action item has an owner and due date
  • "What went well" section is genuine, not token
  • No action item contains vague language like "improve monitoring", "increase resilience", or "better testing" — each must name a specific change
  • Executive summary is readable by non-technical leadership

Anti-Patterns

  • Do not assign blame to individuals — postmortems must focus on system and process failures
  • Do not write action items with vague language like "improve monitoring" — each must name a specific, ownable change
  • Do not skip the contributing factors — root cause alone misses the systemic issues that enable incidents
  • Do not omit the detection timeline — how long it took to detect matters as much as how long it took to resolve
  • Do not treat the postmortem as closed until all action items have named owners and due dates

Usage Examples

  • "Write a postmortem for the [incident name] outage"
  • "Help me write a P1 incident report"
  • "Generate an RCA document for [service] going down on [date]"
  • "Draft a blameless postmortem from these notes: [paste notes]"

Bundled with this artifact

1 file

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Knowledge Management

Write and maintain knowledge base articles from resolved support issues. Use when a ticket has been resolved and the solution should be documented, when updating existing KB articles, or when creating how-to guides, troubleshooting docs, or FAQ entries.

customer-success+2
0
SKILL0

Business Case Builder

Build business cases with cost-benefit analysis, ROI calculations, risk assessment, implementation timeline, and stakeholder analysis

product-management+1
0
SKILL0

Interview Kit Builder

TRIGGER THIS when preparing interviews, planning hiring processes, creating interview questions, building evaluation frameworks, developing interviewer guides, or standardizing interview approaches. Builds complete interview kits with competency-based questions, scorecards, rubrics, interviewer training, and structured evaluation protocols.

recruiting-hr+2
0