Navigation
UI-TARS Desktop: Natural Language Control & AI-Driven Actions - MCP Implementation

UI-TARS Desktop: Natural Language Control & AI-Driven Actions

UI-TARS Desktop: Command your PC with natural language. Say goodbye to clunky clicks - our Vision-Language AI turns words into actions. Smarter, faster, seamless control.

Developer Tools
4.3(75 reviews)
112 saves
52 comments

Users create an average of 29 projects per month with this tool

About UI-TARS Desktop

What is UI-TARS Desktop: Natural Language Control & AI-Driven Actions?

UI-TARS Desktop is a groundbreaking GUI Agent application built on the Vision-Language Model UI-TARS, enabling users to control their computers through natural language commands. This multimodal AI agent visually interprets web pages, integrates seamlessly with command lines and file systems, and operates locally for privacy. Launched as a technical preview, it empowers users to automate tasks with human-like interaction flows.

How to Use UI-TARS Desktop: Natural Language Control & AI-Driven Actions?

Getting started is straightforward: install the desktop app and begin issuing commands via a user-friendly interface. For example, you can say “Check the current weather in San Francisco using the browser” or “Draft a Twitter post with ‘hello world’.” The system processes requests in real time, providing visual feedback through screen captures and action logs. Detailed guidance is available in our Quick Start Guide.

UI-TARS Desktop Features

Key Features of UI-TARS Desktop: Natural Language Control & AI-Driven Actions

Intuitive Natural Language Processing

Communicate with your computer using everyday language, reducing the need for manual clicks or scripting.

Visual Intelligence

Automatically captures screenshots and analyzes UI elements to execute precise mouse/keyboard actions. Supports cross-platform operation on Windows and macOS.

Enterprise-Grade Security

All processing occurs locally by default, ensuring sensitive data never leaves your device.

Scalable Automation

Leverage the UI-TARS SDK to build custom agents for enterprise workflows or personal productivity tasks.

Use Cases of UI-TARS Desktop: Natural Language Control & AI-Driven Actions

From simple tasks like social media posting to complex workflows involving browser automation and file management, here are key scenarios:

Web Automation

Automatically fill forms, scrape data, or navigate multi-step processes without manual intervention.

Cross-Platform Control

Issue commands that span applications, browsers, and command-line tools in a single workflow.

Accessibility Enhancement

Users with motor impairments can perform actions through voice or text commands with minimal setup.

UI-TARS Desktop FAQ

FAQ from UI-TARS Desktop: Natural Language Control & AI-Driven Actions

Is UI-TARS Desktop secure?

Yes. All processing occurs locally by default. Optional cloud deployment via ModelScope is also available for enterprise use cases.

Does it work on Linux?

Currently supported on Windows and macOS. Community-driven Linux support is planned through the open-source SDK.

Can I modify the AI model?

Customization is possible through the Hugging Face model hub or ModelScope platform.

How is it different from voice assistants?

UI-TARS combines visual context awareness with precise GUI control, enabling tasks that require pixel-level accuracy and cross-application coordination.

Content

[!IMPORTANT]

[2025-03-18] We released a technical preview version of a new desktop app - Agent TARS, a multimodal AI agent that leverages browser operations by visually interpreting web pages and seamlessly integrating with command lines and file systems.

UI-TARS

UI-TARS Desktop

UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.

   📑 Paper    | 🤗 Hugging Face Models   |   🫨 Discord   |   🤖 ModelScope  
🖥️ Desktop Application    |    👓 Midscene (use in browser)

Showcases

Instruction Video
Get the current weather in SF using the web browser
Send a twitter with the content "hello world"

News

  • [2025-02-20] - 📦 Introduced UI TARS SDK, is a powerful cross-platform toolkit for building GUI automation agents.
  • [2025-01-23] - 🚀 We updated the Cloud Deployment section in the 中文版: GUI模型部署教程 with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.

Features

  • 🤖 Natural language control powered by Vision-Language Model
  • 🖥️ Screenshot and visual recognition support
  • 🎯 Precise mouse and keyboard control
  • 💻 Cross-platform support (Windows/MacOS)
  • 🔄 Real-time feedback and status display
  • 🔐 Private and secure - fully local processing

Quick Start

See Quick Start.

Deployment

See Deployment.

Contributing

See CONTRIBUTING.md.

SDK (Experimental)

See @ui-tars/sdk

License

UI-TARS Desktop is licensed under the Apache License 2.0.

Citation

If you find our paper and code useful in your research, please consider giving a star :star: and citation :pencil:

@article{qin2025ui,
  title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
  author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
  journal={arXiv preprint arXiv:2501.12326},
  year={2025}
}

Related MCP Servers & Clients