API Rate Limiting Skill

Design complete rate limiting, quota, and retry systems for any API.

Rate Limiting Algorithms

Algorithm	Best For	Trade-offs
Token bucket	Bursty traffic with sustained avg	Allows bursts; slightly complex
Leaky bucket	Strict rate enforcement	Smooths bursts; can feel slow
Fixed window	Simple counting	Boundary spike problem
Sliding window log	Precise limiting	Memory-intensive
Sliding window counter	Balance of precision/memory	Best for most APIs

Recommendation: Use sliding window counter for API endpoints, token bucket for streaming/upload endpoints.

Response Headers (RFC standard)

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1700000060
X-RateLimit-Policy: 100;w=60;comment="per minute"
Retry-After: 18

429 Response Body

{
  "error": "rate_limit_exceeded",
  "message": "Too many requests. You have exceeded 100 requests per minute.",
  "retry_after_seconds": 18,
  "limit": 100,
  "window": "60s",
  "reset_at": "2024-01-01T00:01:00Z"
}

Tiered Quota Design

Tier	Requests/min	Requests/day	Burst	Concurrent
Free	10	1,000	20	2
Starter	100	50,000	200	10
Pro	1,000	500,000	2,000	50
Enterprise	Custom	Unlimited	Custom	Custom

Quota Endpoints

GET  /api/v1/account/quota         — current usage vs limits
GET  /api/v1/account/quota/history — usage over time

Response:

{
  "plan": "pro",
  "period": "2024-01",
  "limits": { "requests_per_minute": 1000, "requests_per_day": 500000 },
  "usage": { "requests_today": 12345, "requests_this_minute": 234 },
  "resets_at": "2024-02-01T00:00:00Z"
}

Retry Logic (client-side)

Exponential backoff with jitter

import random, time

def retry_with_backoff(fn, max_retries=5, base_delay=1.0, max_delay=60.0):
    for attempt in range(max_retries):
        try:
            return fn()
        except RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            # Use Retry-After header if present, else exponential backoff
            delay = min(
                e.retry_after or (base_delay * (2 ** attempt)),
                max_delay
            )
            # Add jitter to prevent thundering herd
            delay += random.uniform(0, delay * 0.1)
            time.sleep(delay)

Retryable vs Non-retryable status codes

Status	Retry?	Strategy
429	Yes	Respect `Retry-After` header
500	Yes	Exponential backoff
502/503	Yes	Exponential backoff
504	Yes	Exponential backoff
400	No	Fix request
401	No	Refresh token, then retry once
403	No	Fix permissions
404	No	Fix URL
422	No	Fix payload

Circuit Breaker Pattern

States: CLOSED → OPEN → HALF-OPEN → CLOSED

CLOSED: normal operation
  - Track failure rate in rolling window
  - If failure rate > threshold (e.g. 50% in 10s): → OPEN

OPEN: reject all requests immediately (fail-fast)
  - Return 503 without calling downstream
  - After cooldown period (e.g. 30s): → HALF-OPEN

HALF-OPEN: allow limited traffic through
  - If first N requests succeed: → CLOSED
  - If any fail: → OPEN again

Idempotency Keys

For state-changing requests that may be retried:

POST /api/v1/payments
Idempotency-Key: uuid-v4-client-generated

Response includes:
Idempotency-Key: uuid-v4-client-generated
X-Idempotent-Replayed: true  (if this is a duplicate)

Store: idempotency key → response, expire after 24h. Return cached response for duplicate keys.

After Completing the API Ratelimit Output

Once the API ratelimit output is delivered, ask the user:

"Would you like me to generate API documentation for this design? (yes/no)"

If the user says yes:

Check if the API Documentation skill is available in the installed skills list
If the skill is available:
- Read and follow the instructions in the API Documentation skill
- Use the API rate limiting output above as the input
If the skill is NOT available:
- Inform the user: "It looks like the API Documentation skill isn't installed. You can install it and re-run.

If the user says no:

End the task here

API Rate Limiting Helper