Fetcher MCP: Enterprise Web Scraping & Bulletproof Automation

Fetcher MCP: Unleash headless Playwright's full might for seamless, enterprise-grade web scraping. Built for devs who despise flaky tools and love bulletproof automation.

Visit Repository

✨ Developer Tools

4.8(133 reviews)

199 saves

93 comments

Ranked in the top 6% of all AI tools in its category

About Fetcher MCP

What is Fetcher MCP: Enterprise Web Scraping & Bulletproof Automation?

Fetcher MCP is a server-side solution built on Playwright, designed to extract content from dynamic web pages at scale. Unlike traditional scrapers, it leverages headless browser automation to handle JavaScript-rendered content, offering enterprise-grade reliability through intelligent resource blocking, error recovery, and parallel processing. Ideal for scenarios requiring high concurrency and precision in content extraction.

How to use Fetcher MCP: Enterprise Web Scraping & Bulletproof Automation?

Deploy in three steps:

Quick launch: Run npx -y fetcher-mcp for immediate use
Configuration: Add to Claude Desktop config JSON with custom parameters (e.g., timeout values, media blocking)
Execution: Use fetch_url for single requests or fetch_urls for batch processing

Debugging is enabled via --debug flag or per-request parameters for manual authentication workflows.

Fetcher MCP Features

Key Features of Fetcher MCP: Enterprise Web Scraping & Bulletproof Automation?

Dynamic Content Mastery: Renders modern SPAs and JS-heavy sites using Playwright's Chromium engine
Smart Extraction: Automatically isolates article bodies using optimized readability algorithms, removing ads and boilerplate
Performance Tuning: Blocks non-critical resources (images, CSS) by default, configurable via disableMedia parameter
Enterprise Workflows: Process 100+ URLs simultaneously with fetch_urls, returning structured HTML/Markdown outputs
Fallback Strategies: Built-in retries, timeout escalation, and navigation wait modes for unstable sites

Use cases of Fetcher MCP: Enterprise Web Scraping & Bulletproof Automation?

Common scenarios include:

Competitive Intelligence: Monitoring e-commerce product pages with automatic price/stock tracking
Content Aggregation: Building news digests by extracting articles from 100+ sources in parallel
Internal Tools: Scraping intranet apps requiring manual login via debug mode
Risk Management: Monitoring financial sites with anti-bot protections using extended timeouts

Fetcher MCP FAQ

FAQ from Fetcher MCP: Enterprise Web Scraping & Bulletproof Automation?

Q: How do I handle sites with CAPTCHAs?
Use waitForNavigation: true to wait for post-load redirects, then manually complete verification in debug mode.
Q: Can I customize timeout values?
Set timeout (page load) and navigationTimeout (post-load interactions) up to 120 seconds for problematic sites.
Q: How does parallel processing work?
The fetch_urls endpoint opens multiple browser tabs, managing 20+ concurrent requests automatically.
Q: What's the difference between HTML and Markdown output?
HTML preserves raw formatting for technical content, while Markdown simplifies text-based analysis workflows.
Q: Is authentication supported?
Use debug mode to manually log in, then store session cookies for subsequent automated requests.

Content

Fetcher MCP

MCP server for fetch web page content using Playwright headless browser.

Advantages

JavaScript Support : Unlike traditional web scrapers, Fetcher MCP uses Playwright to execute JavaScript, making it capable of handling dynamic web content and modern web applications.
Intelligent Content Extraction : Built-in Readability algorithm automatically extracts the main content from web pages, removing ads, navigation, and other non-essential elements.
Flexible Output Format : Supports both HTML and Markdown output formats, making it easy to integrate with various downstream applications.
Parallel Processing : The fetch_urls tool enables concurrent fetching of multiple URLs, significantly improving efficiency for batch operations.
Resource Optimization : Automatically blocks unnecessary resources (images, stylesheets, fonts, media) to reduce bandwidth usage and improve performance.
Robust Error Handling : Comprehensive error handling and logging ensure reliable operation even when dealing with problematic web pages.
Configurable Parameters : Fine-grained control over timeouts, content extraction, and output formatting to suit different use cases.

Quick Start

Run directly with npx:

npx -y fetcher-mcp

Debug Mode

Run with the --debug option to show the browser window for debugging:

npx -y fetcher-mcp --debug

Configuration MCP

Configure this MCP server in Claude Desktop:

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "fetcher": {
      "command": "npx",
      "args": ["-y", "fetcher-mcp"]
    }
  }
}

Features

fetch_url - Retrieve web page content from a specified URL
- Uses Playwright headless browser to parse JavaScript
- Supports intelligent extraction of main content and conversion to Markdown
- Supports the following parameters:
  - url: The URL of the web page to fetch (required parameter)
  - timeout: Page loading timeout in milliseconds, default is 30000 (30 seconds)
  - waitUntil: Specifies when navigation is considered complete, options: 'load', 'domcontentloaded', 'networkidle', 'commit', default is 'load'
  - extractContent: Whether to intelligently extract the main content, default is true
  - maxLength: Maximum length of returned content (in characters), default is no limit
  - returnHtml: Whether to return HTML content instead of Markdown, default is false
  - waitForNavigation: Whether to wait for additional navigation after initial page load (useful for sites with anti-bot verification), default is false
  - navigationTimeout: Maximum time to wait for additional navigation in milliseconds, default is 10000 (10 seconds)
  - disableMedia: Whether to disable media resources (images, stylesheets, fonts, media), default is true
  - debug: Whether to enable debug mode (showing browser window), overrides the --debug command line flag if specified
fetch_urls - Batch retrieve web page content from multiple URLs in parallel
- Uses multi-tab parallel fetching for improved performance
- Returns combined results with clear separation between webpages
- Supports the following parameters:
  - urls: Array of URLs to fetch (required parameter)
  - Other parameters are the same as fetch_url

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

Wait for Complete Loading : For websites using CAPTCHA, redirects, or other verification mechanisms, include in your prompt:
```
Please wait for the page to fully load
```

This will use the waitForNavigation: true parameter.

Increase Timeout Duration : For websites that load slowly:
```
Please set the page loading timeout to 60 seconds
```

This adjusts both timeout and navigationTimeout parameters accordingly.

Content Retrieval Adjustments

Preserve Original HTML Structure : When content extraction might fail:
```
Please preserve the original HTML content
```

Sets extractContent: false and returnHtml: true.

Fetch Complete Page Content : When extracted content is too limited:

Please fetch the complete webpage content instead of just the main content

Sets extractContent: false.

Return Content as HTML : When HTML format is needed instead of default Markdown:
```
Please return the content in HTML format
```

Sets returnHtml: true.

Debugging and Authentication

Enabling Debug Mode

Dynamic Debug Activation : To display the browser window during a specific fetch operation:
```
Please enable debug mode for this fetch operation
```

This sets debug: true even if the server was started without the --debug flag.

Using Custom Cookies for Authentication

Manual Login : To login using your own credentials:

Please run in debug mode so I can manually log in to the website

Sets debug: true or uses the --debug flag, keeping the browser window open for manual login.

Interacting with Debug Browser : When debug mode is enabled:
1. The browser window remains open
2. You can manually log into the website using your credentials
3. After login is complete, content will be fetched with your authenticated session
Enable Debug for Specific Requests : Even if the server is already running, you can enable debug mode for a specific request:
```
Please enable debug mode for this authentication step
```

Sets debug: true for this specific request only, opening the browser window for manual login.

Development

Install Dependencies

npm install

Install Playwright Browser

Install the browsers needed for Playwright:

npm run install-browser

Build the Server

npm run build

Debugging

Use MCP Inspector for debugging:

npm run inspector

You can also enable visible browser mode for debugging:

node build/index.js --debug

License

Licensed under the MIT License