Websearch: Smarter Searches, Faster Discoveries

Websearch: Instantly find accurate answers, effortlessly navigate the web. Your time wasted, gone. Smarter searches, faster discoveries—no clutter, just power.

Visit Repository

✨ Research And Data

4.6(17 reviews)

25 saves

11 comments

Ranked in the top 3% of all AI tools in its category

About Websearch

What is Websearch: Smarter Searches, Faster Discoveries?

Websearch is an advanced search solution built on the Model Context Protocol (MCP) framework. Designed to enhance search efficiency and accuracy, it leverages specialized tools like FlareSolverr to bypass Cloudflare protections, ensuring seamless access to restricted web content. The platform delivers structured search results with enriched metadata, enabling users to quickly parse relevant information from diverse sources. Its modular architecture supports integration into custom workflows, making it ideal for developers, researchers, and enterprises requiring robust search capabilities.

How to Use Websearch: Smarter Searches, Faster Discoveries?

Deployment: Initialize the service using Docker Compose with predefined configurations.
API Interaction: Query endpoints to execute searches and retrieve results in JSON format.
Customization: Adjust parameters such as search depth, result formatting, and output structure via API headers.
Integration: Embed Websearch into applications using provided SDKs or RESTful API calls.

Websearch Features

Key Features of Websearch: Smarter Searches, Faster Discoveries

MCP Compatibility: Fully aligned with MCP standards for seamless protocol adoption.
Cloudflare Bypass: Automated handling of CAPTCHA and bot detection mechanisms.
Granular Control: Configure search scope, result filtering, and data extraction rules.
Diagnostic Suite: Built-in logging and validation tools to troubleshoot connectivity or parsing issues.
Scalable Output: Supports both lightweight summaries and detailed document-level metadata exports.

Use Cases for Websearch: Smarter Searches, Faster Discoveries

Content Aggregation: Curate real-time data from news portals, forums, and e-commerce platforms.
Competitive Analysis: Monitor market trends by extracting product listings and pricing details.
Academic Research: Harvest peer-reviewed papers and citations across multiple repositories.
Automation Workflows: Integrate with RPA tools to automate data collection for CRM systems.

Websearch FAQ

FAQ: Smarter Searches, Faster Discoveries

Q: How do I resolve API connection errors?
A: Verify Docker container status and firewall settings. Check the logs/api-service.log for HTTP status codes.
Q: Why are some results incomplete?
A: Enable verbose logging via DEBUG=1 to diagnose parsing failures. Adjust selectors in config/parsers.yml for target websites.
Q: Can I customize result formatting?
A: Yes. Use the X-Result-Format header to specify JSON-LD, CSV, or custom schema outputs.
Q: What security measures are implemented?
A: Rate limiting, TLS encryption, and IP whitelisting prevent abuse while FlareSolverr handles anti-bot mechanisms.

Content

WebSearch-MCP

A Model Context Protocol (MCP) server implementation that provides a web search capability over stdio transport. This server integrates with a WebSearch Crawler API to retrieve search results.

About
Installation
Configuration
Setup & Integration
- Setting Up the Crawler Service
  - Prerequisites
  - Starting the Crawler Service
  - Testing the Crawler API
  - Custom Configuration
- Integrating with MCP Clients
  - Quick Reference: MCP Configuration
  - Claude Desktop
  - Cursor IDE
  - Cline
Usage
- Parameters
- Example Search Response
- Testing Locally
- As a Library
Troubleshooting
- Crawler Service Issues
- MCP Server Issues
Development
- Project Structure
- Publishing to npm
Contributing
License

About

WebSearch-MCP is a Model Context Protocol server that provides web search capabilities to AI assistants that support MCP. It allows AI models like Claude to search the web in real-time, retrieving up-to-date information about any topic.

The server integrates with a Crawler API service that handles the actual web searches, and communicates with AI assistants using the standardized Model Context Protocol.

Installation

npm install -g websearch-mcp

Or use without installing:

npx websearch-mcp

Configuration

The WebSearch MCP server can be configured using environment variables:

API_URL: The URL of the WebSearch Crawler API (default: http://localhost:3001)
MAX_SEARCH_RESULT: Maximum number of search results to return when not specified in the request (default: 5)

Examples:

# Configure API URL
API_URL=https://crawler.example.com npx websearch-mcp

# Configure maximum search results
MAX_SEARCH_RESULT=10 npx websearch-mcp

# Configure both
API_URL=https://crawler.example.com MAX_SEARCH_RESULT=10 npx websearch-mcp

Setup & Integration

Setting up WebSearch-MCP involves two main parts: configuring the crawler service that performs the actual web searches, and integrating the MCP server with your AI client applications.

Setting Up the Crawler Service

The WebSearch MCP server requires a crawler service to perform the actual web searches. You can easily set up the crawler service using Docker Compose.

Prerequisites

Docker and Docker Compose

Starting the Crawler Service

Create a file named docker-compose.yml with the following content:

version: '3.8'

services:
  crawler:
    image: laituanmanh/websearch-crawler:latest
    container_name: websearch-api
    restart: unless-stopped
    ports:
      - "3001:3001"
    environment:
      - NODE_ENV=production
      - PORT=3001
      - LOG_LEVEL=info
      - FLARESOLVERR_URL=http://flaresolverr:8191/v1
    depends_on:
      - flaresolverr
    volumes:
      - crawler_storage:/app/storage

  flaresolverr:
    image: 21hsmw/flaresolverr:nodriver
    container_name: flaresolverr
    restart: unless-stopped
    environment:
      - LOG_LEVEL=info
      - TZ=UTC

volumes:
  crawler_storage:

workaround for Mac Apple Silicon

version: '3.8'

services:
  crawler:
    image: laituanmanh/websearch-crawler:latest
    container_name: websearch-api
    platform: "linux/amd64"
    restart: unless-stopped
    ports:
      - "3001:3001"
    environment:
      - NODE_ENV=production
      - PORT=3001
      - LOG_LEVEL=info
      - FLARESOLVERR_URL=http://flaresolverr:8191/v1
    depends_on:
      - flaresolverr
    volumes:
      - crawler_storage:/app/storage

  flaresolverr:
    image: 21hsmw/flaresolverr:nodriver
    platform: "linux/arm64"
    container_name: flaresolverr
    restart: unless-stopped
    environment:
      - LOG_LEVEL=info
      - TZ=UTC

volumes:
  crawler_storage:

Start the services:

docker-compose up -d

Verify that the services are running:

docker-compose ps

Test the crawler API health endpoint:

curl http://localhost:3001/health

Expected response:

{
  "status": "ok",
  "details": {
    "status": "ok",
    "flaresolverr": true,
    "google": true,
    "message": null
  }
}

The crawler API will be available at http://localhost:3001.

Testing the Crawler API

You can test the crawler API directly using curl:

curl -X POST http://localhost:3001/crawl \
  -H "Content-Type: application/json" \
  -d '{
    "query": "typescript best practices",
    "numResults": 2,
    "language": "en",
    "filters": {
      "excludeDomains": ["youtube.com"],
      "resultType": "all" 
    }
  }'

Custom Configuration

You can customize the crawler service by modifying the environment variables in the docker-compose.yml file:

PORT: The port on which the crawler API listens (default: 3001)
LOG_LEVEL: Logging level (options: debug, info, warn, error)
FLARESOLVERR_URL: URL of the FlareSolverr service (for bypassing Cloudflare protection)

Integrating with MCP Clients

Quick Reference: MCP Configuration

Here's a quick reference for MCP configuration across different clients:

{
    "mcpServers": {
        "websearch": {
            "command": "npx",
            "args": [
                "websearch-mcp"
            ],
            "environment": {
                "API_URL": "http://localhost:3001",
                "MAX_SEARCH_RESULT": "5" // reduce to save your tokens, increase for wider information gain
            }
        }
    }
}

Workaround for Windows, due to Issue

{
	"mcpServers": {
	  "websearch": {
            "command": "cmd",
            "args": [
				"/c",
				"npx",
                "websearch-mcp"
            ],
            "environment": {
                "API_URL": "http://localhost:3001",
                "MAX_SEARCH_RESULT": "1"
            }
        }
	}
  }

Usage

This package implements an MCP server using stdio transport that exposes a web_search tool with the following parameters:

Parameters

query (required): The search query to look up
numResults (optional): Number of results to return (default: 5)
language (optional): Language code for search results (e.g., 'en')
region (optional): Region code for search results (e.g., 'us')
excludeDomains (optional): Domains to exclude from results
includeDomains (optional): Only include these domains in results
excludeTerms (optional): Terms to exclude from results
resultType (optional): Type of results to return ('all', 'news', or 'blogs')

Example Search Response

Here's an example of a search response:

{
  "query": "machine learning trends",
  "results": [
    {
      "title": "Top Machine Learning Trends in 2025",
      "snippet": "The key machine learning trends for 2025 include multimodal AI, generative models, and quantum machine learning applications in enterprise...",
      "url": "https://example.com/machine-learning-trends-2025",
      "siteName": "AI Research Today",
      "byline": "Dr. Jane Smith"
    },
    {
      "title": "The Evolution of Machine Learning: 2020-2025",
      "snippet": "Over the past five years, machine learning has evolved from primarily supervised learning approaches to more sophisticated self-supervised and reinforcement learning paradigms...",
      "url": "https://example.com/ml-evolution",
      "siteName": "Tech Insights",
      "byline": "John Doe"
    }
  ]
}

Testing Locally

To test the WebSearch MCP server locally, you can use the included test client:

npm run test-client

This will start the MCP server and a simple command-line interface that allows you to enter search queries and see the results.

You can also configure the API_URL for the test client:

API_URL=https://crawler.example.com npm run test-client

As a Library

You can use this package programmatically:

import { createMCPClient } from '@modelcontextprotocol/sdk';

// Create an MCP client
const client = createMCPClient({
  transport: { type: 'subprocess', command: 'npx websearch-mcp' }
});

// Execute a web search
const response = await client.request({
  method: 'call_tool',
  params: {
    name: 'web_search',
    arguments: {
      query: 'your search query',
      numResults: 5,
      language: 'en'
    }
  }
});

console.log(response.result);

Troubleshooting

Crawler Service Issues

API Unreachable : Ensure that the crawler service is running and accessible at the configured API_URL.
Search Results Not Available : Check the logs of the crawler service to see if there are any errors:
```
docker-compose logs crawler
```
FlareSolverr Issues : Some websites use Cloudflare protection. If you see errors related to this, check if FlareSolverr is working:
```
docker-compose logs flaresolverr
```

MCP Server Issues

Import Errors : Ensure you have the latest version of the MCP SDK:
```
npm install -g @modelcontextprotocol/sdk@latest
```
Connection Issues : Make sure the stdio transport is properly configured for your client.

Development

To work on this project:

Clone the repository
Install dependencies: npm install
Build the project: npm run build
Run in development mode: npm run dev

The server expects a WebSearch Crawler API as defined in the included swagger.json file. Make sure the API is running at the configured API_URL.

Project Structure

.gitignore: Specifies files that Git should ignore (node_modules, dist, logs, etc.)
.npmignore: Specifies files that shouldn't be included when publishing to npm
package.json: Project metadata and dependencies
src/: Source TypeScript files
dist/: Compiled JavaScript files (generated when building)

Publishing to npm

To publish this package to npm:

Make sure you have an npm account and are logged in (npm login)
Update the version in package.json (npm version patch|minor|major)
Run npm publish

The .npmignore file ensures that only the necessary files are included in the published package:

The compiled code in dist/
README.md and LICENSE files
package.json

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

ISC