Navigation
Omniparser-AutoGUI-MCP: Precision Automation & Workflow Acceleration - MCP Implementation

Omniparser-AutoGUI-MCP: Precision Automation & Workflow Acceleration

Omniparser-AutoGUI-MCP automates on-screen GUI operations with precision, slashing manual effort and supercharging workflows—efficiency redefined.

Os Automation
4.9(161 reviews)
241 saves
112 comments

This tool saved users approximately 13459 hours last month!

About Omniparser-AutoGUI-MCP

What is Omniparser-AutoGUI MCP?

This open-source framework bridges the power of screen analysis and automated GUI operations through the MCP protocol. Built on the Omniparser engine, it enables developers to programmatically interact with applications by interpreting visual elements and executing actions. Currently optimized for Windows environments, it supports multi-language OCR through adjustable environment variables.

How to Deploy & Configure

  1. Clone repository: git clone https://github.com/omniparser/autogui-mcp
  2. Initialize dependencies: uv sync && uv prepare
  3. Edit configuration file config/mcp_server.yaml to specify:
    • Target window via TARGET_WINDOW_NAME
    • Language settings with OCR_LANG=en|zh
  4. Launch server: uv run server

Note: macOS/Linux users should use export instead of set for environment configuration

Omniparser-AutoGUI-MCP Features

Core Capabilities

  • Automatic element detection using computer vision
  • Remote processing via OMNI_PARSER_SERVER configuration
  • Client compatibility with tools like LibreChat
  • Context-aware action execution (clicks, typing, drag-and-drop)

Practical Use Cases

Developers commonly use this framework to:

  • Automate repetitive desktop workflows (e.g., data entry)
  • Create accessibility tools for vision-impaired users
  • Build cross-application integration solutions
  • Test UI/UX consistency across different screen configurations

Omniparser-AutoGUI-MCP FAQ

FAQ

Why use OMNI_PARSER_BACKEND_LOAD=1?

Necessary when deploying with non-standard clients to ensure proper backend initialization

Can I automate window switching?

Yes - use the WINDOW_FOCUS_TIMEOUT parameter to manage context switching between applications

How to update OCR models?

Run uv run update_models to fetch the latest language-specific recognition libraries

Does it work with virtual machines?

Supported in Windows VMs with GUI enabled. Linux VMs require X11 forwarding configuration

Content

omniparser-autogui-mcp

日本語版はこちら

This is an MCP server that analyzes the screen with OmniParser and automatically operates the GUI.
Confirmed on Windows.

License notes

This is MIT license, but Excluding submodules and sub packages.
OmniParser's repository is CC-BY-4.0.
Each OmniParser model has a different license (reference).

Installation

  1. Please do the following:
git clone --recursive https://github.com/NON906/omniparser-autogui-mcp.git
cd omniparser-autogui-mcp
uv sync
set OCR_LANG=en
uv run download_models.py

(Other than Windows, use export instead of set.)
(If you want langchain_example.py to work, uv sync --extra langchain instead.)

  1. Add this to your claude_desktop_config.json:
{
  "mcpServers": {
    "omniparser_autogui_mcp": {
      "command": "uv",
      "args": [
        "--directory",
        "D:\\CLONED_PATH\\omniparser-autogui-mcp",
        "run",
        "omniparser-autogui-mcp"
      ],
      "env": {
        "PYTHONIOENCODING": "utf-8",
        "OCR_LANG": "en"
      }
    }
  }
}

(Replace D:\\CLONED_PATH\\omniparser-autogui-mcp with the directory you cloned.)

env allows for the following additional configurations:

  • OMNI_PARSER_BACKEND_LOAD
    If it does not work with other clients (such as LibreChat), specify 1.

  • TARGET_WINDOW_NAME
    If you want to specify the window to operate, please specify the window name.
    If not specified, operates on the entire screen.

  • OMNI_PARSER_SERVER
    If you want OmniParser processing to be done on another device, specify the server's address and port, such as 127.0.0.1:8000.
    The server can be started with uv run omniparserserver.

  • SSE_HOST, SSE_PORT
    If specified, communication will be done via SSE instead of stdio.

  • SOM_MODEL_PATH, CAPTION_MODEL_NAME, CAPTION_MODEL_PATH, OMNI_PARSER_DEVICE, BOX_TRESHOLD
    These are for OmniParser configuration.
    Usually, they are not necessary.

Usage Examples

  • Search for "MCP server" in the on-screen browser.

etc.

Related MCP Servers & Clients