Database Designer

Use when the user asks to design database schemas, plan data migrations, optimize queries, choose between SQL and NoSQL, or model data relationships.

Published by @Alireza Rezvani·0 agent reads / 30d·0 saves·

Database Designer - POWERFUL Tier Skill

Overview

A comprehensive database design skill that provides expert-level analysis, optimization, and migration capabilities for modern database systems. This skill combines theoretical principles with practical tools to help architects and developers create scalable, performant, and maintainable database schemas.

Core Competencies

Schema Design & Analysis

  • Normalization Analysis: Automated detection of normalization levels (1NF through BCNF)
  • Denormalization Strategy: Smart recommendations for performance optimization
  • Data Type Optimization: Identification of inappropriate types and size issues
  • Constraint Analysis: Missing foreign keys, unique constraints, and null checks
  • Naming Convention Validation: Consistent table and column naming patterns
  • ERD Generation: Automatic Mermaid diagram creation from DDL

Index Optimization

  • Index Gap Analysis: Identification of missing indexes on foreign keys and query patterns
  • Composite Index Strategy: Optimal column ordering for multi-column indexes
  • Index Redundancy Detection: Elimination of overlapping and unused indexes
  • Performance Impact Modeling: Selectivity estimation and query cost analysis
  • Index Type Selection: B-tree, hash, partial, covering, and specialized indexes

Migration Management

  • Zero-Downtime Migrations: Expand-contract pattern implementation
  • Schema Evolution: Safe column additions, deletions, and type changes
  • Data Migration Scripts: Automated data transformation and validation
  • Rollback Strategy: Complete reversal capabilities with validation
  • Execution Planning: Ordered migration steps with dependency resolution

Tool Workflow (run these — do not analyze schemas by hand)

All paths relative to this skill folder; sample inputs in assets/.

1. Analyze the schema

python3 schema_analyzer.py --input schema.sql --generate-erd --output-format json -o analysis.json

Accepts SQL DDL or JSON schema (assets/sample_schema.sql / sample_schema.json). Output includes normalization findings, missing constraints, naming issues, and a Mermaid ERD — show the ERD to the user and fix flagged issues before optimizing.

2. Optimize indexes against real query patterns

python3 index_optimizer.py --schema assets/sample_schema.json --queries assets/sample_query_patterns.json --analyze-existing --format json -o indexes.json

Write the user's hot queries into a query-patterns JSON first (copy assets/sample_query_patterns.json). Output is a priority-ordered list of CREATE INDEX recommendations plus redundant-index removals.

3. Generate the migration

python3 migration_generator.py --current current_schema.json --target target_schema.json --zero-downtime --format sql -o migration.sql

--zero-downtime emits an expand-contract plan; --validate-only checks feasibility without generating SQL.

4. Verification loop

Re-run step 1 on the target schema and assert the issues found in the first pass are gone; run migration_generator.py --validate-only before handing over the migration.

Database Design Principles

→ See references/database-design-reference.md for details

Best Practices

Schema Design

  1. Use meaningful names: Clear, consistent naming conventions
  2. Choose appropriate data types: Right-sized columns for storage efficiency
  3. Define proper constraints: Foreign keys, check constraints, unique indexes
  4. Consider future growth: Plan for scale from the beginning
  5. Document relationships: Clear foreign key relationships and business rules

Performance Optimization

  1. Index strategically: Cover common query patterns without over-indexing
  2. Monitor query performance: Regular analysis of slow queries
  3. Partition large tables: Improve query performance and maintenance
  4. Use appropriate isolation levels: Balance consistency with performance
  5. Implement connection pooling: Efficient resource utilization

Security Considerations

  1. Principle of least privilege: Grant minimal necessary permissions
  2. Encrypt sensitive data: At rest and in transit
  3. Audit access patterns: Monitor and log database access
  4. Validate inputs: Prevent SQL injection attacks
  5. Regular security updates: Keep database software current

Query Generation Patterns

SELECT with JOINs

-- INNER JOIN: only matching rows
SELECT o.id, c.name, o.total
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id;

-- LEFT JOIN: all left rows, NULLs for non-matches
SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.name;

-- Self-join: hierarchical data (employees/managers)
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;

Common Table Expressions (CTEs)

-- Recursive CTE for org chart
WITH RECURSIVE org AS (
  SELECT id, name, manager_id, 1 AS depth
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.depth + 1
  FROM employees e INNER JOIN org o ON o.id = e.manager_id
)
SELECT * FROM org ORDER BY depth, name;

Window Functions

-- ROW_NUMBER for pagination / dedup
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders;

-- RANK with gaps, DENSE_RANK without gaps
SELECT name, score, RANK() OVER (ORDER BY score DESC) AS rank FROM leaderboard;

-- LAG/LEAD for comparing adjacent rows
SELECT date, revenue,
  revenue - LAG(revenue) OVER (ORDER BY date) AS daily_change
FROM daily_sales;

Aggregation Patterns

-- FILTER clause (PostgreSQL) for conditional aggregation
SELECT
  COUNT(*) AS total,
  COUNT(*) FILTER (WHERE status = 'active') AS active,
  AVG(amount) FILTER (WHERE amount > 0) AS avg_positive
FROM accounts;

-- GROUPING SETS for multi-level rollups
SELECT region, product, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS ((region, product), (region), ());

Migration Patterns

Up/Down Migration Scripts

Every migration must have a reversible counterpart. Name files with a timestamp prefix for ordering:

migrations/
├── 20260101_000001_create_users.up.sql
├── 20260101_000001_create_users.down.sql
├── 20260115_000002_add_users_email_index.up.sql
└── 20260115_000002_add_users_email_index.down.sql

Zero-Downtime Migrations (Expand/Contract)

Use the expand-contract pattern to avoid locking or breaking running code:

  1. Expand — add the new column/table (nullable, with default)
  2. Migrate data — backfill in batches; dual-write from application
  3. Transition — application reads from new column; stop writing to old
  4. Contract — drop old column in a follow-up migration

Data Backfill Strategies

-- Batch update to avoid long-running locks
UPDATE users SET email_normalized = LOWER(email)
WHERE id IN (SELECT id FROM users WHERE email_normalized IS NULL LIMIT 5000);
-- Repeat in a loop until 0 rows affected

Rollback Procedures

  • Always test the down.sql in staging before deploying up.sql to production
  • Keep rollback window short — if the contract step has run, rollback requires a new forward migration
  • For irreversible changes (dropping columns with data), take a logical backup first

Performance Optimization

Indexing Strategies

Index TypeUse CaseExample
B-tree (default)Equality, range, ORDER BYCREATE INDEX idx_users_email ON users(email);
GINFull-text search, JSONB, arraysCREATE INDEX idx_docs_body ON docs USING gin(to_tsvector('english', body));
GiSTGeometry, range types, nearest-neighborCREATE INDEX idx_locations ON places USING gist(coords);
PartialSubset of rows (reduce size)CREATE INDEX idx_active ON users(email) WHERE active = true;
CoveringIndex-only scansCREATE INDEX idx_cov ON orders(customer_id) INCLUDE (total, created_at);

EXPLAIN Plan Reading

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;

Key signals to watch:

  • Seq Scan on large tables — missing index
  • Nested Loop with high row estimates — consider hash/merge join or add index
  • Buffers shared read much higher than hit — working set exceeds memory

N+1 Query Detection

Symptoms: application issues one query per row (e.g., fetching related records in a loop).

Fixes:

  • Use JOIN or subquery to fetch in one round-trip
  • ORM eager loading (select_related / includes / with)
  • DataLoader pattern for GraphQL resolvers

Connection Pooling

ToolProtocolBest For
PgBouncerPostgreSQLTransaction/statement pooling, low overhead
ProxySQLMySQLQuery routing, read/write splitting
Built-in pool (HikariCP, SQLAlchemy pool)AnyApplication-level pooling

Rule of thumb: Set pool size to (2 * CPU cores) + disk spindles. For cloud SSDs, start with 2 * vCPUs and tune.

Read Replicas and Query Routing

  • Route all SELECT queries to replicas; writes to primary
  • Account for replication lag (typically <1s for async, 0 for sync)
  • Use pg_last_wal_replay_lsn() to detect lag before reading critical data

Multi-Database Decision Matrix

CriteriaPostgreSQLMySQLSQLiteSQL Server
Best forComplex queries, JSONB, extensionsWeb apps, read-heavy workloadsEmbedded, dev/test, edgeEnterprise .NET stacks
JSON supportExcellent (JSONB + GIN)Good (JSON type)MinimalGood (OPENJSON)
ReplicationStreaming, logicalGroup replication, InnoDB clusterN/AAlways On AG
LicensingOpen source (PostgreSQL License)Open source (GPL) / commercialPublic domainCommercial
Max practical sizeMulti-TBMulti-TB~1 TB (single-writer)Multi-TB

When to choose:

  • PostgreSQL — default choice for new projects; best extensibility and standards compliance
  • MySQL — existing MySQL ecosystem; simple read-heavy web applications
  • SQLite — mobile apps, CLI tools, unit test databases, IoT/edge
  • SQL Server — mandated by enterprise policy; deep .NET/Azure integration

NoSQL Considerations

DatabaseModelUse When
MongoDBDocumentSchema flexibility, rapid prototyping, content management
RedisKey-value / cacheSession store, rate limiting, leaderboards, pub/sub
DynamoDBWide-columnServerless AWS apps, single-digit-ms latency at any scale

Use SQL as default. Reach for NoSQL only when the access pattern clearly benefits from it.


Sharding & Replication

Horizontal vs Vertical Partitioning

  • Vertical partitioning: Split columns across tables (e.g., separate BLOB columns). Reduces I/O for narrow queries.
  • Horizontal partitioning (sharding): Split rows across databases/servers. Required when a single node cannot hold the dataset or handle the throughput.

Sharding Strategies

StrategyHow It WorksProsCons
Hashshard = hash(key) % NEven distributionResharding is expensive
RangeShard by date or ID rangeSimple, good for time-seriesHot spots on latest shard
GeographicShard by user regionData locality, complianceCross-region queries are hard

Replication Patterns

PatternConsistencyLatencyUse Case
SynchronousStrongHigher write latencyFinancial transactions
AsynchronousEventualLow write latencyRead-heavy web apps
Semi-synchronousAt-least-one replica confirmedModerateBalance of safety and speed

Cross-References

  • sql-database-assistant — query writing, optimization, and debugging for day-to-day SQL work
  • database-schema-designer — ERD modeling, normalization analysis, and schema generation
  • migration-architect — large-scale migration planning across database engines or major schema overhauls
  • senior-backend — application-layer patterns (connection pooling, ORM best practices)
  • senior-devops — infrastructure provisioning for database clusters and replicas

Bundled with this artifact

15 files

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Postgres

Execute read-only SQL queries against multiple PostgreSQL databases. Use when: (1) querying PostgreSQL databases, (2) exploring database schemas/tables, (3) running SELECT queries for data analysis, (4) checking database contents. Supports multiple database connections with descriptions for intelligent auto-selection. Blocks all write operations (INSERT, UPDATE, DELETE, DROP, etc.) for safety.

software-engineering+2
0
SKILL0

Pylabrobot

Vendor-agnostic lab automation framework. Use when controlling multiple equipment types (Hamilton, Tecan, Opentrons, plate readers, pumps) or needing unified programming across different vendors. Best for complex workflows, multi-vendor setups, simulation. For Opentrons-only protocols with official API, opentrons-integration may be simpler.

software-engineering+2
0
SKILL0

Nextflow

Build, run, and debug Nextflow data pipelines and nf-core workflows end to end. Use whenever the user mentions Nextflow, nf-core, .nf files, nextflow.config, DSL2, processes/channels/operators, samplesheets, or wants to run a community pipeline (e.g. nf-core/rnaseq, nf-core/sarek), write or test a module/subworkflow with nf-test, configure executors/containers (Docker, Singularity/Apptainer, Conda, Wave), scale a workflow to HPC/SLURM or cloud (AWS Batch, Google Batch, Azure, Kubernetes), or debug a failed/-resume run. Make sure to use this skill for any reproducible scientific/bioinformatics workflow work even if the user does not say the word "Nextflow", and for authoring nf-core-compliant pipelines, modules, configs, and linting.

software-engineering+2
0