MCP Webscan Server

A Model Context Protocol (MCP) server for web content scanning and analysis. This server provides tools for fetching, analyzing, and extracting information from web pages.

Features
- Page Fetching : Convert web pages to Markdown for easy analysis
- Link Extraction : Extract and analyze links from web pages
- Site Crawling : Recursively crawl websites to discover content
- Link Checking : Identify broken links on web pages
- Pattern Matching : Find URLs matching specific patterns
- Sitemap Generation : Generate XML sitemaps for websites
Installation
Installing via Smithery
To install Webscan for Claude Desktop automatically via Smithery:
npx -y @smithery/cli install mcp-server-webscan --client claude
Manual Installation
# Clone the repository
git clone <repository-url>
cd mcp-server-webscan
# Install dependencies
npm install
# Build the project
npm run build
Usage
Starting the Server
npm start
The server runs on stdio transport, making it compatible with MCP clients like Claude Desktop.
Available Tools
fetch_page
* Fetches a web page and converts it to Markdown
* Parameters:
* `url` (required): URL of the page to fetch
* `selector` (optional): CSS selector to target specific content
extract_links
* Extracts all links from a web page with their text
* Parameters:
* `url` (required): URL of the page to analyze
* `baseUrl` (optional): Base URL to filter links
crawl_site
* Recursively crawls a website up to a specified depth
* Parameters:
* `url` (required): Starting URL to crawl
* `maxDepth` (optional, default: 2): Maximum crawl depth
check_links
* Checks for broken links on a page
* Parameters:
* `url` (required): URL to check links for
find_patterns
* Finds URLs matching a specific pattern
* Parameters:
* `url` (required): URL to search in
* `pattern` (required): Regex pattern to match URLs against
generate_sitemap
* Generates a simple XML sitemap
* Parameters:
* `url` (required): Root URL for sitemap
* `maxUrls` (optional, default: 100): Maximum number of URLs to include
Example Usage with Claude Desktop
- Configure the server in your Claude Desktop settings:
{
"mcpServers": {
"webscan": {
"command": "node",
"args": ["path/to/mcp-server-webscan/dist/index.js"],
"env": {
"NODE_ENV": "development"
}
}
}
}
- Use the tools in your conversations:
Could you fetch the content from https://example.com and convert it to Markdown?
Development
Prerequisites
Project Structure
mcp-server-webscan/
├── src/
│ └── index.ts # Main server implementation
├── dist/ # Compiled JavaScript
├── package.json
└── tsconfig.json
Building
npm run build
Development Mode
npm run dev
Error Handling
The server implements comprehensive error handling:
- Invalid parameters
- Network errors
- Content parsing errors
- URL validation
All errors are properly formatted according to the MCP specification.
Contributing
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
)
- Commit your changes (
git commit -m 'Add some amazing feature'
)
- Push to the branch (
git push origin feature/amazing-feature
)
- Open a Pull Request
License
MIT License - see the LICENSE file for details