SEO Firecrawl

Full-site crawling, scraping, and site mapping via Firecrawl MCP. Use when user says "crawl site", "map site", "full crawl", "find all pages", "broken links", "site structure", "discover pages", "JS rendering", or needs site-wide analysis.

Published by @AgriciDaniel·from AgriciDaniel/claude-seo·0 agent reads / 30d·0 saves·

Firecrawl Extension for Claude SEO

This skill requires the Firecrawl extension to be installed:

./extensions/firecrawl/install.sh

Check availability: Before using any Firecrawl tool, verify the MCP server is connected by checking if firecrawl_scrape or any Firecrawl tool is available. If tools are not available, inform the user the extension is not installed and provide install instructions.

Quick Reference

CommandPurpose
/seo firecrawl crawl <url>Full-site crawl with content extraction
/seo firecrawl map <url>Discover site structure (URLs only, fast)
/seo firecrawl scrape <url>Single-page scrape with JS rendering
/seo firecrawl search <query> <url>Search within a crawled site

Commands

crawl -- Full-Site Crawl

Crawl an entire website starting from the given URL. Returns page content, metadata, and links for all discovered pages.

MCP Tool: firecrawl_crawl

Parameters:

  • url (required): Starting URL to crawl
  • limit: Max pages to crawl (default: 100, max: 500)
  • maxDepth: Max link depth from start URL (default: 3)
  • includePaths: Array of glob patterns to include (e.g., ["/blog/*"])
  • excludePaths: Array of glob patterns to exclude (e.g., ["/admin/*", "/api/*"])
  • scrapeOptions.formats: Output formats -- ["markdown", "html", "links"]

SEO Usage Patterns:

  1. Comprehensive audit crawl: Crawl full site, extract all pages for subagent analysis
  2. Section-focused crawl: Use includePaths to audit only /blog/* or /products/*
  3. Broken link detection: Crawl with ["links"] format, check all hrefs for 404s
  4. Content inventory: Extract all page titles, meta descriptions, H1s at scale
  5. SPA/JS-rendered sites: Firecrawl renders JavaScript, solving the Issue #11 problem

Example orchestration for /seo audit:

1. firecrawl_map(url) -> get all URLs (fast, no content)
2. Filter to top 50 most important pages (homepage, key sections)
3. firecrawl_crawl(url, limit=50) -> get full content
4. Feed content to seo-technical, seo-content, seo-schema agents

Cost awareness:

  • Free tier: 500 credits/month
  • 1 credit = 1 page crawled or scraped
  • Map operations are cheaper (0.5 credits per URL discovered)
  • Always inform user of estimated credit usage before large crawls

map -- Site Structure Discovery

Discover all URLs on a website without fetching content. Fast and credit-efficient.

MCP Tool: firecrawl_map

Parameters:

  • url (required): Website URL to map
  • limit: Max URLs to discover (default: 5000)
  • search: Optional search term to filter URLs

SEO Usage Patterns:

  1. Sitemap comparison: Map site, compare discovered URLs vs XML sitemap
  2. Orphan page detection: URLs in sitemap but not linked from any page
  3. Crawl budget analysis: Total indexable pages vs pages linked from homepage
  4. URL pattern analysis: Identify URL structure patterns, duplicates, parameter bloat
  5. Pre-audit discovery: Run map first, then targeted crawl on key sections

Output: Array of URLs. Present as:

Site: example.com
Pages discovered: 342

URL Pattern Breakdown:
  /blog/*          - 128 pages (37%)
  /products/*      - 89 pages (26%)
  /category/*      - 45 pages (13%)
  /pages/*         - 32 pages (9%)
  / (root pages)   - 48 pages (14%)

scrape -- Single-Page Deep Scrape

Scrape a single page with full JavaScript rendering. More thorough than fetch_page.py because it executes JS and waits for dynamic content.

MCP Tool: firecrawl_scrape

Parameters:

  • url (required): Page URL to scrape
  • formats: Output formats -- ["markdown", "html", "links", "screenshot"]
  • onlyMainContent: Strip nav/footer/sidebar (default: true)
  • waitFor: CSS selector or milliseconds to wait for content
  • timeout: Request timeout in ms (default: 30000)
  • actions: Browser actions before scraping (click, scroll, wait)

SEO Usage Patterns:

  1. SPA content extraction: Scrape JS-rendered React/Vue/Angular pages
  2. Dynamic content audit: Pages with lazy-loaded content below the fold
  3. Paywall/login detection: Identify content behind authentication walls
  4. Main content extraction: Use onlyMainContent for clean E-E-A-T analysis
  5. Screenshot capture: Use screenshot format for visual analysis

When to use scrape vs fetch_page.py:

ScenarioUse
Static HTML pagefetch_page.py (no API cost)
JS-rendered SPAfirecrawl_scrape (renders JS)
Need response headersfetch_page.py (returns headers)
Need clean markdownfirecrawl_scrape (better extraction)
Rate-limited/blockedfirecrawl_scrape (handles anti-bot)

search -- Site-Scoped Search

Search within a website for specific content. Useful for finding pages related to a topic without crawling everything.

MCP Tool: firecrawl_search

Parameters:

  • query (required): Search query
  • url (required): Website to search within
  • limit: Max results (default: 10)
  • scrapeOptions.formats: Output format for matched pages

SEO Usage Patterns:

  1. Content gap validation: Search for a keyword on the site to check if content exists
  2. Internal linking opportunities: Find pages mentioning a topic that could link to each other
  3. Duplicate content detection: Search for key phrases to find near-duplicates
  4. Competitor content research: Search competitor site for specific topics

Cross-Skill Integration

With seo-audit (full audit)

When Firecrawl is available during /seo audit:

  1. Use firecrawl_map to discover all site URLs
  2. Compare with XML sitemap (seo-sitemap) to find orphan/missing pages
  3. Select top pages for deep analysis
  4. Feed crawled content to all subagents (technical, content, schema, geo)
  5. Report total crawlable pages, URL patterns, and crawl depth

With seo-technical

  • Broken link detection: crawl all internal links, check for 404s
  • Redirect chain mapping: follow all redirects, flag chains > 2 hops
  • Mixed content detection: check HTTP resources on HTTPS pages
  • Canonical verification: compare canonical URLs with actual URLs

With seo-sitemap

  • Sitemap coverage: % of crawled pages present in sitemap
  • Orphan pages: pages found by crawl but missing from sitemap
  • Stale sitemap entries: URLs in sitemap that return 404/410

With seo-content

  • Content extraction: feed clean markdown to E-E-A-T analysis
  • Thin content detection: identify pages with < 300 words at scale
  • Duplicate content: compare content across pages for near-duplicates

With seo-schema

  • Schema extraction: pull JSON-LD from all crawled pages
  • Schema coverage: % of pages with structured data
  • Schema validation: batch-validate extracted schemas

Error Handling

ErrorCauseResolution
FIRECRAWL_API_KEY not setMCP not configuredRun ./extensions/firecrawl/install.sh
402 Payment RequiredCredits exhaustedCheck usage at firecrawl.dev/app, upgrade plan
429 Too Many RequestsRate limitedWait 60s, reduce crawl concurrency
408 TimeoutPage too slow to renderIncrease timeout, try without JS rendering
403 ForbiddenSite blocks crawlingCheck robots.txt, may need to skip this site

Graceful fallback: If Firecrawl is unavailable, inform the user and suggest:

  1. Use fetch_page.py for single-page analysis (no API cost)
  2. Use WebFetch tool for basic HTML retrieval
  3. Install Firecrawl: ./extensions/firecrawl/install.sh

Bundled with this artifact

3 files

Reference files that ship alongside this artifact. Agents pull these in only when the task needs them.

More on the bench

SKILL0

Pptx

Use this skill any time a .pptx file is involved in any way — as input, output, or both. This includes: creating slide decks, pitch decks, or presentations; reading, parsing, or extracting text from any .pptx file (even if the extracted content will be used elsewhere, like in an email or summary); editing, modifying, or updating existing presentations; combining or splitting slide files; working with templates, layouts, speaker notes, or comments. Trigger whenever the user mentions "deck," "slides," "presentation," or references a .pptx filename, regardless of what they plan to do with the content afterward. If a .pptx file needs to be opened, created, or touched, use this skill.

product-management+1
0
SKILL0

Draft Outreach

Research a prospect then draft personalized outreach. Uses web research by default, supercharged with enrichment and CRM. Trigger with "draft outreach to [person/company]", "write cold email to [prospect]", "reach out to [name]".

sales-gtm-revops+1
0
SKILL0

Twitter Algorithm Optimizer

Analyze and optimize tweets for maximum reach using Twitter's open-source algorithm insights. Rewrite and edit user tweets to improve engagement and visibility based on how the recommendation system ranks content.

marketing-growth-copy+2
0