Chrome API MCP Server: AI-Driven Chrome Automation & Control

Empower AI assistants to seamlessly control Chrome via DevTools Protocol – the MCP Server turns browsers into AI-driven automation powerhouses. Your commands, their actions." )

Visit Repository

✨ Browser Automation

4.2(136 reviews)

204 saves

95 comments

Users create an average of 50 projects per month with this tool

About Chrome API MCP Server

What is Chrome API MCP Server: AI-Driven Chrome Automation & Control?

This server acts as an intelligent automation layer for Chrome browsers, leveraging semantic analysis to interpret web page structures. It replaces traditional coordinate-based interactions with context-aware commands, enabling AI systems to navigate, extract data, and execute actions based on semantic element recognition. Core capabilities include real-time DOM analysis, interactive element mapping, and adaptive error handling strategies.

How to Use Chrome API MCP Server: AI-Driven Chrome Automation & Control?

Deployment involves three core steps:
1. Configure environment variables and dependencies
2. Launch the server with optional debug logging
3. Interact via API endpoints to trigger page loads, semantic queries, and element actions. Use the provided starter script to manage Chrome instances and port configurations, with optional parameter overrides for customization.

Chrome API MCP Server Features

Key Features of Chrome API MCP Server: AI-Driven Chrome Automation & Control?

Semantic DOM Analysis: Classifies elements into navigation, forms, buttons, and content groups
Adaptive Interaction Engine: Tries multiple interaction methods (click, JS execute, keyboard) before failing
Self-optimizing Cache: Manages element state persistence with configurable TTL and eviction policies
Granular Error Handling: Captures detailed failure contexts for debugging automation workflows
Performance Tracing: Logs page load metrics and interaction timings for optimization analysis

Use Cases for Chrome API MCP Server

Automated accessibility testing using semantic element validation
Dynamic content extraction from complex SPA applications
AI-powered chatbots managing user sessions across web apps
Continuous integration testing for browser-based dashboards
Compliance monitoring of dynamically rendered web content

Chrome API MCP Server FAQ

FAQ: Common Questions About the Chrome API MCP Server

Q: What dependencies are required?
Requires Node.js 16+, Chrome 100+, and the Puppeteer library configured for headless execution.

Q: How do I handle port conflicts?
Specify alternative ports via the CHROME_PORT and API_PORT environment variables before startup.

Q: Can I customize semantic analysis rules?
Yes, through the config/semantic-mapping.js file where element classification logic can be extended.

Q: What logging mechanisms exist?
Supports verbose debugging via --log-level=debug, and stores structured logs in /var/log/mcp-server by default.

Content

Chrome API MCP Server

A Chrome API MCP (Model Context Protocol) server that provides semantic understanding of web pages for AI assistants like Claude, enabling DOM-based browsing without relying on screenshots.

Features

Semantic DOM Analysis : Build structured representations of web pages
Efficient Browsing : Provides content extraction without relying on screenshots
Interactive Navigation : Identify and interact with elements based on semantics
Reliable Element Selection : Multiple strategies for finding and interacting with page elements
Cache Optimization : Smart caching system for improved performance
Error Handling : Robust error management for reliable operation
Detailed Logging : Comprehensive logging system for debugging

Requirements

Node.js 16+
Chrome browser (must be installed)
npm or yarn

Quick Start

Make the startup script executable:

chmod +x start-custom-mcp.sh
Start the server:

./start-custom-mcp.sh

This script will:

* Check if Chrome is running with remote debugging enabled
* Start Chrome with the correct flags if needed
* Build the TypeScript code
* Start the MCP server on port 3001

Run the example to test functionality:

npx ts-node examples/analyze-page.ts https://example.com

API Methods

Basic Methods

initialize: Initialize the connection
navigate: Open a URL in a new tab
getContent: Get the raw HTML content of a page
executeScript: Execute JavaScript code in a tab
clickElement: Click on an element matching a CSS selector
takeScreenshot: Capture a screenshot (optional)
closeTab: Close a tab

Semantic Understanding Methods

getStructuredContent: Get a structured representation of the page content
analyzePageSemantics: Analyze the page and build a semantic DOM model
findElementsByText: Find elements containing specific text
findClickableElements: Find all interactive elements on the page
clickSemanticElement: Click an element by its semantic ID
fillFormField: Fill a form field by its semantic ID
performSearch: Use the page's search functionality

Example Usage

// Initialize and navigate to a page
const response1 = await fetch('http://localhost:3001', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    jsonrpc: '2.0',
    method: 'navigate',
    params: { url: 'https://example.com' },
    id: 1
  })
});
const { result: { tabId } } = await response1.json();

// Get structured content
const response2 = await fetch('http://localhost:3001', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    jsonrpc: '2.0',
    method: 'getStructuredContent',
    params: { tabId },
    id: 2
  })
});
const { result: { content } } = await response2.json();
console.log(content);

// Find and click a button
const response3 = await fetch('http://localhost:3001', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    jsonrpc: '2.0',
    method: 'findElementsByText',
    params: { tabId, text: 'Login' },
    id: 3
  })
});
const { result: { elements } } = await response3.json();

if (elements.length > 0) {
  await fetch('http://localhost:3001', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      jsonrpc: '2.0',
      method: 'clickSemanticElement',
      params: { tabId, semanticId: elements[0].semanticId },
      id: 4
    })
  });
}

Configuration

Chrome debugging port : 9222 (default)
MCP server port : 3001 (configurable via PORT environment variable)
Debug mode : Set environment variable DEBUG=true to enable verbose logging
Cache settings : Configured in config.ts
- Default TTL: 10 seconds
- Max cache size: 200 entries
Connection timeouts : Configurable in config.ts

Debugging

Enable debug mode by setting the DEBUG environment variable:

DEBUG=true ./start-custom-mcp.sh

For more granular debugging of specific modules, use:

DEBUG=chrome-page-analyzer:* ./start-custom-mcp.sh

Log Files

The server creates log files in the logs directory with detailed information about all operations:

Log location : ./logs/chrome-mcp.log
Log rotation : Automatically rotates logs when they reach 10MB (configurable)
Log levels : ERROR, WARN, INFO, DEBUG, TRACE
Log content : Timestamps, request IDs, method calls, parameters, and results

You can view logs in real-time using:

tail -f logs/chrome-mcp.log

How It Works

When a page is loaded, the server builds a semantic model of the page
The model includes:
* Semantic element types (navigation, button, link, form, content, etc.)
* Text content and structure
* Interactive elements
* Hierarchical relationships
AI systems can query this model to understand the page content and structure
Actions can be performed through semantic references rather than coordinates

This approach is more efficient than using screenshots and provides better context for AI assistants to understand and interact with web pages.

Architecture

The server is built with a modular architecture:

ChromeAPI : Main API class exposing methods to clients
DOMInteractionLayer : Core DOM interaction functionality
SemanticAnalyzer : Semantic understanding of page structure
ContentExtractor : Extract structured content from pages
Error Handler : Centralized error management
DOM Helpers : Utility functions for DOM manipulation
Cache : Optimized caching system for improved performance
Logger : Comprehensive logging system for debugging

Development

Install dependencies:

npm install
Build TypeScript code:

npm run build
Run the server with debugging:

DEBUG=true node custom-mcp-server.js
Run tests:

npm test

Troubleshooting

Chrome connection issues : Make sure Chrome is running with the remote debugging port open. You can start it manually with google-chrome --remote-debugging-port=9222.
Port conflicts : If port 3001 is already in use, set a different port with PORT=3002 ./start-custom-mcp.sh.
TypeScript build errors : Check for any type errors in the source code and fix them before building.
Element interaction failures : If clicking elements fails, the server attempts multiple strategies (mouse events, JavaScript). Check the debug logs for details.
Memory issues : If you encounter memory problems, adjust the cache settings in config.ts.
Log file access : If you encounter permission issues with log files, make sure the user running the server has write access to the logs directory.

License

MIT

Related MCP Servers & Clients

MCP Categories

Browser Automation