Database Schema Design

Document or design a database schema with entity relationships, table definitions, constraints, indexes, and access patterns. Use when asked to design a database, document an existing schema, model entities and relationships, define table structures, plan an index strategy, or produce a data model for review. Produces a structured schema document covering an ER diagram, table DDL definitions, index strategy, access pattern analysis, normalization decisions, and migration notes.

Published by @Mohit Aggarwal·0 agent reads / 30d·0 saves·

Database Schema Design Skill

Produce a complete database schema design document for a given domain. A schema document is not just a list of tables — it is a record of decisions: what was modelled, how entities relate, which queries the schema is optimised for, and what trade-offs were made.

A good schema design document lets an engineer understand the data model, query it correctly, extend it safely, and write migrations without breaking things.

Required Inputs

Ask for these if not already provided:

  • Domain description — what the system does; what business objects are being modelled
  • Entities and relationships — the main things in the domain and how they relate (e.g. "a User has many Orders; an Order has many OrderItems; an OrderItem references a Product")
  • Expected query patterns — the most important read and write queries (e.g. "fetch all orders for a user, sorted by date"; "look up a product by SKU")
  • Database engine — PostgreSQL, MySQL, SQLite, CockroachDB, etc. — this affects DDL syntax and available types
  • Expected data volume — approximate row counts, growth rate, and any partitioning needs
  • Constraints — any existing conventions, naming standards, or migration constraints to respect

Output Format


Database Schema Design: [Domain / Service Name]

Service: [Name] | Team: [Team name] Author: [Name] | Reviewed by: [Name] Date: [Date] | Database engine: [PostgreSQL X.X / MySQL X.X / etc.] Status: [Draft / Reviewed / Approved]


1. Overview

[2–3 sentences describing the domain being modelled, the scope of this schema, and any key design philosophy (e.g. "this schema prioritises read performance for the customer-facing API over write simplicity", or "designed for eventual migration to multi-tenancy")]

In scope:

  • [Entity or subsystem]
  • [Entity or subsystem]

Out of scope:

  • [e.g. Analytics / reporting tables — separate schema]
  • [e.g. Audit log tables — covered in separate design doc]

2. Entity Relationship Diagram

┌───────────────────┐         ┌───────────────────────┐
│      users        │         │       organisations    │
│─────────────────  │         │─────────────────────── │
│ id (PK)           │    ┌───▶│ id (PK)                │
│ org_id (FK)  ─────┼────┘    │ name                   │
│ email             │         │ plan                   │
│ display_name      │         │ created_at             │
│ created_at        │         └───────────────────────┘
│ updated_at        │
└─────────┬─────────┘
          │ 1
          │
          │ N
┌─────────▼─────────┐         ┌───────────────────────┐
│      [table_a]    │         │      [table_b]         │
│─────────────────  │         │─────────────────────── │
│ id (PK)           │    N    │ id (PK)                │
│ user_id (FK) ─────┼────────▶│ [table_a]_id (FK)      │
│ [field]           │    │    │ [field]                │
│ [field]           │    │    │ [field]                │
│ created_at        │         │ created_at             │
└───────────────────┘         └───────────────────────┘

Relationship summary:

Entity ARelationshipEntity BNotes
organisationshas manyusersAn org can have many users
usershas many[table_a]Soft-deleted on user deletion
[table_a]has many[table_b]Cascade delete
[table_b]belongs to[table_a]Non-nullable FK
[table_c]many-to-many (via [join_table])[table_d]Join table with metadata

3. Table Definitions

organisations

[1 sentence describing what this table stores and its role in the domain.]

CREATE TABLE organisations (
    id          UUID            PRIMARY KEY DEFAULT gen_random_uuid(),
    name        VARCHAR(255)    NOT NULL,
    slug        VARCHAR(100)    NOT NULL UNIQUE,
    plan        VARCHAR(50)     NOT NULL DEFAULT 'free'
                                CHECK (plan IN ('free', 'pro', 'enterprise')),
    settings    JSONB           NOT NULL DEFAULT '{}',
    created_at  TIMESTAMPTZ     NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ     NOT NULL DEFAULT now()
);
ColumnTypeNullableDefaultNotes
idUUIDNogen_random_uuid()Surrogate PK — UUID preferred over serial for distributed use
nameVARCHAR(255)NoDisplay name; not unique
slugVARCHAR(100)NoURL-safe identifier; unique across all orgs
planVARCHAR(50)No'free'Constrained to known values via CHECK
settingsJSONBNo{}Flexible config; avoid for queryable fields
created_atTIMESTAMPTZNonow()Always use TIMESTAMPTZ, not TIMESTAMP
updated_atTIMESTAMPTZNonow()Updated via trigger (see below)

users

[1 sentence describing what this table stores.]

CREATE TABLE users (
    id              UUID            PRIMARY KEY DEFAULT gen_random_uuid(),
    org_id          UUID            NOT NULL REFERENCES organisations(id)
                                    ON DELETE RESTRICT,
    email           VARCHAR(254)    NOT NULL,
    display_name    VARCHAR(255)    NOT NULL DEFAULT '',
    role            VARCHAR(50)     NOT NULL DEFAULT 'member'
                                    CHECK (role IN ('owner', 'admin', 'member', 'viewer')),
    email_verified  BOOLEAN         NOT NULL DEFAULT false,
    deleted_at      TIMESTAMPTZ     NULL,
    created_at      TIMESTAMPTZ     NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ     NOT NULL DEFAULT now(),

    CONSTRAINT users_email_org_unique UNIQUE (email, org_id)
);
ColumnTypeNullableDefaultNotes
idUUIDNogen_random_uuid()
org_idUUIDNoFK to organisations; RESTRICT prevents orphaning
emailVARCHAR(254)NoRFC 5321 max length; unique per org (not globally)
roleVARCHAR(50)No'member'Application-level RBAC
deleted_atTIMESTAMPTZYesNULLSoft delete; NULL = active

Soft delete policy: Rows with deleted_at IS NOT NULL are considered deleted. All application queries MUST filter WHERE deleted_at IS NULL unless explicitly fetching deleted records. Use a view or ORM scope to enforce this.


[table_a]

[Description of what this table models.]

CREATE TABLE [table_a] (
    id          UUID            PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id     UUID            NOT NULL REFERENCES users(id) ON DELETE CASCADE,
    [field_1]   VARCHAR(255)    NOT NULL,
    [field_2]   TEXT            NULL,
    [field_3]   INTEGER         NOT NULL DEFAULT 0 CHECK ([field_3] >= 0),
    status      VARCHAR(50)     NOT NULL DEFAULT 'pending'
                                CHECK (status IN ('pending', 'active', 'archived')),
    metadata    JSONB           NOT NULL DEFAULT '{}',
    created_at  TIMESTAMPTZ     NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ     NOT NULL DEFAULT now()
);
ColumnTypeNullableNotes
user_idUUIDNoCASCADE delete — when user is deleted, their [table_a] rows are too
[field_1]VARCHAR(255)No[Reason for length constraint]
statusVARCHAR(50)NoState machine: pending → active → archived (no other transitions)
metadataJSONBNo[What is stored here and why it's not a typed column]

[join_table] (Many-to-many)

[Description of the relationship this table represents.]

CREATE TABLE [join_table] (
    [table_c]_id    UUID        NOT NULL REFERENCES [table_c](id) ON DELETE CASCADE,
    [table_d]_id    UUID        NOT NULL REFERENCES [table_d](id) ON DELETE CASCADE,
    granted_by      UUID        NOT NULL REFERENCES users(id) ON DELETE RESTRICT,
    granted_at      TIMESTAMPTZ NOT NULL DEFAULT now(),

    PRIMARY KEY ([table_c]_id, [table_d]_id)
);

Why a composite PK: The combination of [table_c]_id + [table_d]_id is the natural key — each association is unique and the primary key doubles as the uniqueness constraint without needing a separate index.


4. Index Strategy

For each table, define which indexes are created and why. Include the query they are designed to serve.

TableIndex nameColumnsTypeQuery servedNotes
usersusers_org_id_idx(org_id)B-treeSELECT * FROM users WHERE org_id = $1FK lookup; required for join performance
usersusers_email_lower_idx(lower(email))B-tree (functional)WHERE lower(email) = lower($1)Case-insensitive email lookup
usersusers_active_by_org_idx(org_id, created_at DESC)B-treeWHERE org_id = $1 AND deleted_at IS NULL ORDER BY created_at DESCPartial index candidate (see below)
[table_a][table_a]_user_id_status_idx(user_id, status)B-treeWHERE user_id = $1 AND status = 'active'Compound — order matters
[table_a][table_a]_metadata_gin_idxmetadataGINWHERE metadata @> '{"key": "value"}'Only add if JSONB queried frequently

Partial indexes (PostgreSQL):

-- Index only active (non-deleted) users — dramatically smaller for soft-delete tables
CREATE INDEX users_active_email_idx
    ON users (email, org_id)
    WHERE deleted_at IS NULL;

-- Index only pending items — avoids indexing the majority of rows
CREATE INDEX [table_a]_pending_idx
    ON [table_a] (user_id, created_at)
    WHERE status = 'pending';

Index design principles applied:

  • FKs that appear in JOIN conditions always have an index
  • Compound indexes follow selectivity order: most selective column first
  • Functional indexes for case-insensitive lookups
  • GIN indexes only where JSONB containment queries are frequent
  • Partial indexes for status-filtered queries on large tables

5. Access Pattern Analysis

Document the primary queries this schema is designed to serve. For each, show the query, the indexes used, and any caveats.

AP-1: Fetch all active users for an organisation (paginated)

Frequency: Very high — called on every dashboard load Query:

SELECT id, email, display_name, role, created_at
FROM users
WHERE org_id = $1
  AND deleted_at IS NULL
ORDER BY created_at DESC
LIMIT 50 OFFSET $2;

Index used: users_active_by_org_idx (org_id, created_at DESC) Notes: Use keyset pagination (WHERE created_at < $cursor) at scale; OFFSET degrades past ~10k rows.


AP-2: Look up a user by email (case-insensitive)

Frequency: High — every authentication attempt Query:

SELECT id, org_id, role, email_verified
FROM users
WHERE lower(email) = lower($1)
  AND deleted_at IS NULL;

Index used: users_email_lower_idx Notes: Returns multiple rows if same email exists across orgs. Application resolves by org context.


AP-3: Fetch [table_a] items for a user by status

Frequency: High Query:

SELECT *
FROM [table_a]
WHERE user_id = $1
  AND status = $2
ORDER BY created_at DESC
LIMIT 25;

Index used: [table_a]_user_id_status_idx Notes: Compound index covers both filter columns. Status filter must come second in the index because user_id is more selective.


AP-4: [Add further access patterns as needed]


6. Normalization Decisions

Document deliberate choices to normalize or denormalize, with reasoning.

DecisionApproachReasoning
[e.g. Organisation name on users table?]Not denormalized — always join to organisationsAvoid stale copies; org name changes are infrequent and joining is cheap
[e.g. Status history]Not in this table — separate [table_a]_status_history if neededCurrent status is all that's needed for 99% of queries; history is auditing, not application data
[e.g. JSONB settings column on organisations]Denormalized into JSONBSettings are read together; never queried by field; schema changes don't require migrations
[e.g. Computed aggregate counts]Not stored — computed at query timeCounts are small; maintaining a counter column requires careful locking; use SELECT COUNT(*) with the index

7. Triggers and Automation

-- Automatically update updated_at on any row modification
CREATE OR REPLACE FUNCTION set_updated_at()
RETURNS TRIGGER AS $$
BEGIN
    NEW.updated_at = now();
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;

-- Apply to all tables with updated_at
CREATE TRIGGER users_updated_at
    BEFORE UPDATE ON users
    FOR EACH ROW EXECUTE FUNCTION set_updated_at();

CREATE TRIGGER [table_a]_updated_at
    BEFORE UPDATE ON [table_a]
    FOR EACH ROW EXECUTE FUNCTION set_updated_at();

8. Migration Notes

If this schema is being introduced to an existing system, note the migration approach.

StepDescriptionBackward compatibleRisk
1Create organisations tableYes — additiveLow
2Create users tableYes — additiveLow
3Backfill org_id on existing usersRequires dual-write periodMedium
4Add NOT NULL constraint on org_idRequires backfill to be 100% completeMedium
5Remove deprecated columnsRequires app code updated firstLow once app deployed

Backfill strategy: [Describe how to handle existing data — batch size, rate limiting, validation queries]

Rollback: Each migration step should be independently reversible. See [database-migration-plan skill] for the full rollback procedure template.


Quality Checks

  • Every table has a primary key and a created_at column — no implicit ordering by row insertion
  • Every foreign key has a corresponding index — no missing FK indexes that would cause full table scans on joins
  • All TIMESTAMPTZ columns, not TIMESTAMP — timezone awareness is explicit
  • Soft-delete tables document the convention and where the filter is enforced (ORM scope, view, or query standard)
  • Every access pattern in the design has a supporting index or an explicit note that a full table scan is acceptable
  • JSONB columns are justified — not used as a substitute for proper schema design on queryable fields
  • Normalization decisions are documented with reasoning, not just stated
  • Migration notes address existing data if this is a schema change, not a greenfield schema

Anti-Patterns

  • Do not use JSONB columns as a substitute for proper relational schema design on fields that will be queried
  • Do not add indexes speculatively — every index must be justified by a specific access pattern
  • Do not omit timezone-awareness — use TIMESTAMPTZ, never plain TIMESTAMP
  • Do not design without documenting normalization decisions — future maintainers need the reasoning, not just the structure
  • Do not skip the access patterns section — schema without query patterns cannot be evaluated for correctness

Bundled with this artifact

1 file

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Data Quality Auditor

Audit datasets for completeness, consistency, accuracy, and validity. Profile data distributions, detect anomalies and outliers, surface structural issues, and produce an actionable remediation plan. Use when the user asks to check data quality, profile a dataset, hunt outliers or missing values, or validate data before analysis or model training.

data-science-ml+2
0
SKILL0

Tensorflow And Deep Learning Rules

TensorFlow and deep learning rules for building, training, evaluating, and deploying neural network models

data-science-ml+1
0
SKILL0

Ros And Ros2 Rules

ROS and ROS2 rules for packages, nodes, launch files, messages, services, actions, simulation, and testing

software-engineering+1
0