API Rate Limiting Skill
Design complete rate limiting, quota, and retry systems for any API.
Rate Limiting Algorithms
| Algorithm | Best For | Trade-offs |
|---|---|---|
| Token bucket | Bursty traffic with sustained avg | Allows bursts; slightly complex |
| Leaky bucket | Strict rate enforcement | Smooths bursts; can feel slow |
| Fixed window | Simple counting | Boundary spike problem |
| Sliding window log | Precise limiting | Memory-intensive |
| Sliding window counter | Balance of precision/memory | Best for most APIs |
Recommendation: Use sliding window counter for API endpoints, token bucket for streaming/upload endpoints.
Response Headers (RFC standard)
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1700000060
X-RateLimit-Policy: 100;w=60;comment="per minute"
Retry-After: 18
429 Response Body
{
"error": "rate_limit_exceeded",
"message": "Too many requests. You have exceeded 100 requests per minute.",
"retry_after_seconds": 18,
"limit": 100,
"window": "60s",
"reset_at": "2024-01-01T00:01:00Z"
}
Tiered Quota Design
| Tier | Requests/min | Requests/day | Burst | Concurrent |
|---|---|---|---|---|
| Free | 10 | 1,000 | 20 | 2 |
| Starter | 100 | 50,000 | 200 | 10 |
| Pro | 1,000 | 500,000 | 2,000 | 50 |
| Enterprise | Custom | Unlimited | Custom | Custom |
Quota Endpoints
GET /api/v1/account/quota — current usage vs limits
GET /api/v1/account/quota/history — usage over time
Response:
{
"plan": "pro",
"period": "2024-01",
"limits": { "requests_per_minute": 1000, "requests_per_day": 500000 },
"usage": { "requests_today": 12345, "requests_this_minute": 234 },
"resets_at": "2024-02-01T00:00:00Z"
}
Retry Logic (client-side)
Exponential backoff with jitter
import random, time
def retry_with_backoff(fn, max_retries=5, base_delay=1.0, max_delay=60.0):
for attempt in range(max_retries):
try:
return fn()
except RateLimitError as e:
if attempt == max_retries - 1:
raise
# Use Retry-After header if present, else exponential backoff
delay = min(
e.retry_after or (base_delay * (2 ** attempt)),
max_delay
)
# Add jitter to prevent thundering herd
delay += random.uniform(0, delay * 0.1)
time.sleep(delay)
Retryable vs Non-retryable status codes
| Status | Retry? | Strategy |
|---|---|---|
| 429 | Yes | Respect Retry-After header |
| 500 | Yes | Exponential backoff |
| 502/503 | Yes | Exponential backoff |
| 504 | Yes | Exponential backoff |
| 400 | No | Fix request |
| 401 | No | Refresh token, then retry once |
| 403 | No | Fix permissions |
| 404 | No | Fix URL |
| 422 | No | Fix payload |
Circuit Breaker Pattern
States: CLOSED → OPEN → HALF-OPEN → CLOSED
CLOSED: normal operation
- Track failure rate in rolling window
- If failure rate > threshold (e.g. 50% in 10s): → OPEN
OPEN: reject all requests immediately (fail-fast)
- Return 503 without calling downstream
- After cooldown period (e.g. 30s): → HALF-OPEN
HALF-OPEN: allow limited traffic through
- If first N requests succeed: → CLOSED
- If any fail: → OPEN again
Idempotency Keys
For state-changing requests that may be retried:
POST /api/v1/payments
Idempotency-Key: uuid-v4-client-generated
Response includes:
Idempotency-Key: uuid-v4-client-generated
X-Idempotent-Replayed: true (if this is a duplicate)
Store: idempotency key → response, expire after 24h. Return cached response for duplicate keys.
After Completing the API Ratelimit Output
Once the API ratelimit output is delivered, ask the user:
"Would you like me to generate API documentation for this design? (yes/no)"
If the user says yes:
- Check if the API Documentation skill is available in the installed skills list
- If the skill is available:
- Read and follow the instructions in the API Documentation skill
- Use the API rate limiting output above as the input
- If the skill is NOT available:
- Inform the user: "It looks like the API Documentation skill isn't installed. You can install it and re-run.
If the user says no:
- End the task here