Navigation
Playwright-MCP-Fetch: Parse & Convert Web Content - MCP Implementation

Playwright-MCP-Fetch: Parse & Convert Web Content

Effortlessly fetch, parse & convert web content with Playwright-MCP-Fetch – a MCP server harnessing headless browsers to transform raw sites into structured formats seamlessly.

Developer Tools
4.7(194 reviews)
291 saves
135 comments

Users create an average of 28 projects per month with this tool

About Playwright-MCP-Fetch

What is Playwright-MCP-Fetch: Parse & Convert Web Content?

Playwright-MCP-Fetch is a tool that bridges web content extraction and formatting through the Model Context Protocol (MCP). It leverages Playwright to fetch raw website data and convert it into structured formats like Markdown, plain text, or JSON. This makes it ideal for developers needing to process dynamic web content programmatically.

How to use Playwright-MCP-Fetch: Parse & Convert Web Content?

Start by installing the tool from source or via Docker. First, clone the repository and set up dependencies:

git clone https://github.com/kevinwatt/playwright-mcp-fetch.git
cd playwright-mcp-fetch
pip install -r requirements.txt
playwright install

Next, run the MCP server using playwright-mcp-fetch-sse. To interact with it, use HTTP POST requests or the SSE client example provided. For instance, converting a webpage to Markdown requires a simple API call:

curl -X POST http://localhost:3000/api/call-tool \
-H "Content-Type: application/json" \
-d '{"name": "fetch_markdown", "arguments": {"url": "https://example.com"}}'

Playwright-MCP-Fetch Features

Key Features of Playwright-MCP-Fetch: Parse & Convert Web Content?

  • Format Flexibility: Choose between HTML, Markdown, plain text, or JSON outputs
  • Real-Time Updates: SSE (Server-Sent Events) support for streaming responses
  • Modular Configuration: Enable/disable tools via environment variables like fetch_html
  • Containerization: Docker support for easy deployment

Use cases of Playwright-MCP-Fetch: Parse & Convert Web Content?

Common scenarios include:

• Automatically generating documentation from live web articles

• Creating content scrapers for dynamic e-commerce sites

• Building chatbots that process web content into readable formats

For example, news aggregators can use fetch_markdown

Playwright-MCP-Fetch FAQ

FAQ from Playwright-MCP-Fetch: Parse & Convert Web Content?

Q: How do I troubleshoot connection issues?
Check the server is running on http://localhost:3000 and verify firewall settings. Use curl http://localhost:3000/api/list-tools to confirm endpoints are available.

Q: Can I customize output formatting?
While core formats are fixed, you can modify the source code for custom parsing logic. The fetch_json tool is particularly adaptable for structured data needs.

Q: What browsers are supported?
Playwright's default browsers (Chromium, Firefox, WebKit) are available. Specify browser preferences in playwright.install commands.

Content

playwright-mcp-fetch

Current version: 0.1.5

This tool provides a Model Context Protocol (MCP) server for fetching content from websites and converting it to different formats using Playwright.

Requirements

  • Python 3.10 or higher

Features

  • fetch_html: Fetch the raw HTML content from a website
  • fetch_markdown: Fetch content from a website and convert it to Markdown format
  • fetch_txt: Fetch and return plain text content from a website (HTML tags removed)
  • fetch_json: Fetch and parse JSON content

Installation

From Source

git clone https://github.com/kevinwatt/playwright-mcp-fetch.git
cd playwright-mcp-fetch
pip install -e .

Install Dependencies

pip install -r requirements.txt
# Install Playwright browsers
playwright install

Usage

Run as MCP Server (SSE Transport)

# Run the MCP server with SSE transport
playwright-mcp-fetch-sse

Environment Variables

  • fetch_html: Set to "Enable" to enable the fetch_html tool (default: "Disable")
  • PORT: Set the HTTP port for the SSE server (default: 3000)

MCP Client Configuration

To use this server with an MCP client, configure the client to connect to the SSE endpoint:

{
  "mcpServers": {
    "fetch-tools": {
      "enabled": true,
      "transport": "sse",
      "url": "http://localhost:3000/sse",
      "postUrl": "http://localhost:3000/messages"
    }
  }
}

API Examples

List Tools

curl -X POST http://localhost:3000/api/list-tools

Call a Tool

curl -X POST http://localhost:3000/api/call-tool \
  -H "Content-Type: application/json" \
  -d '{"name": "fetch_markdown", "arguments": {"url": "https://example.com"}}'

SSE Client Example

import json
import aiohttp
import asyncio

async def sse_client():
    async with aiohttp.ClientSession() as session:
        async with session.get("http://localhost:3000/sse") as response:
            async for line in response.content:
                if line.startswith(b"data: "):
                    data = json.loads(line[6:].decode("utf-8"))
                    print(f"Received event: {data}")

asyncio.run(sse_client())

Development

# Install in development mode
pip install -e .

# Run tests
pytest

Docker Support

Build and run with Docker:

docker build -t playwright-mcp-fetch .
docker run -p 3000:3000 -e TRANSPORT_TYPE=sse playwright-mcp-fetch

Or use Docker Compose:

docker-compose up -d

License

MIT

Related MCP Servers & Clients