codex-cua-mcp
Enables AI agents to control Windows desktop applications by wrapping Codex's Computer Use capability.
README
Codex CUA MCP
π¬ LINUX DO Discussion
MCP server that wraps Codex's Computer Use capability, enabling AI agents to control Windows desktop applications.
Features
- List and control Windows desktop applications
- Capture screenshots and accessibility trees
- Click, type, press keys, scroll, drag
- Launch apps and activate windows
- Ready to use - exe bundled, no extra setup needed
Quick Start (Claude Code)
git clone <repo-url>
cd codex-cua-mcp
.\setup.ps1
Restart Claude Code to use.
Other Agents
Works with any MCP-compatible agent (Cursor, Windsurf, Cline, etc.):
{
"mcpServers": {
"codex-cua": {
"command": "node",
"args": ["PATH/codex-cua-mcp/bin/codex-cua-mcp.js"]
}
}
}
Check your agent's documentation for config file location.
How It Works
AI Agent (Claude Code, Cursor, etc.)
β MCP protocol (stdio)
MCP Server (codex-cua-mcp)
β JSON-RPC (stdin/stdout)
codex-computer-use.exe
β Windows APIs
Desktop Applications
The MCP server communicates with codex-computer-use.exe via JSON-RPC over stdin/stdout. The exe uses Windows APIs (SendInput, UI Automation, Windows.Graphics.Capture) to interact with desktop applications.
Each action requires an approval flow on first use per app. The server auto-approves by default for seamless operation.
π Want to understand the design in depth? Read the Architecture Deep Dive. (δΈζη)
Available Tools
| Tool | Description |
|---|---|
list_windows |
List all controllable windows |
list_apps |
List installed apps |
get_window |
Rehydrate a window object |
launch_app |
Launch an application |
activate_window |
Bring window to foreground |
get_window_state |
Capture screenshot + accessibility tree |
click |
Click at coordinates or element |
type_text |
Type text |
press_key |
Press keyboard key |
scroll |
Scroll |
drag |
Drag |
set_value |
Set editable element value |
perform_secondary_action |
Secondary action (right-click menu, etc.) |
Disclaimer
The core functionality comes from codex-computer-use.exe. Actual results depend on the AI agent and model being used β no guarantee of usability.
Requirements
- Windows 10/11
- Node.js 18+
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.