AI Gaming Agent MCP Server
Enables AI agents to remotely control gaming PCs for automated gameplay with tools for screen capture, mouse/keyboard control, workflows, and system operations.
README
AI Gaming Agent MCP Server
MCP (Model Context Protocol) server that enables AI agents like Claude to remotely control gaming PCs for automated gameplay.
Master Claude (Orchestrator)
|
| MCP Protocol (JSON-RPC over HTTP/SSE)
|
+---> PC #1 (MCP Server :8765) --> PyAutoGUI + Optional VLM
+---> PC #2 (MCP Server :8765) --> PyAutoGUI + Optional VLM
+---> PC #N (MCP Server :8765) --> PyAutoGUI + Optional VLM
Features
- 24 MCP Tools: Screen capture, mouse/keyboard control, file operations, system commands, workflow automation
- Workflow Automation: Chain multiple actions into single commands with
run_workflowanddemo_terminal_workflow - Multi-Monitor Support: Target specific monitors for screenshots and actions
- Dual Transport Modes: HTTP/SSE for remote control, stdio for local clients
- Optional Local VLM: Use Ollama (Qwen2.5-VL, Moondream) for fast local screen analysis
- Security-First: Bearer token auth, path restrictions, command blocklist, audit logging
- Cross-Platform: Windows, Linux, macOS with auto-detection
Quick Start
Installation
pip install ai-gaming-agent
Or install from source:
git clone https://github.com/developerz-ai/ai-gaming-agent-mcp.git
cd ai-gaming-agent-mcp
pip install -e .
Start the Server
The server supports two transport modes:
HTTP/SSE Transport (Recommended for Remote Control):
gaming-agent serve --transport http --port 8765 --password your-secret-password
Stdio Transport (For Local MCP Clients):
gaming-agent serve --transport stdio
Connect from Claude Desktop
For HTTP/SSE Transport:
Add to your Claude Desktop config (~/.config/claude/claude_desktop_config.json on Linux, ~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
{
"mcpServers": {
"gaming-pc": {
"transport": "sse",
"url": "http://YOUR-PC-IP:8765/mcp",
"headers": {
"Authorization": "Bearer your-secret-password"
}
}
}
}
For Stdio Transport:
{
"mcpServers": {
"gaming-pc": {
"command": "gaming-agent",
"args": ["serve", "--transport", "stdio"]
}
}
}
Quick Test
Test the complete automation capability with a single command:
# Start the server
gaming-agent serve --transport http --password test123
# In Claude Desktop (after connecting), ask:
"Use the demo_terminal_workflow tool to open a terminal, type 'echo hello world', and close it"
This will:
- ✓ Auto-detect your terminal (gnome-terminal, konsole, xterm, Terminal.app, cmd)
- ✓ Open a new terminal window
- ✓ Type "echo hello world"
- ✓ Press Enter to execute
- ✓ Capture a screenshot for verification
- ✓ Close the terminal with the appropriate hotkey
Available Tools
Total: 24 MCP Tools
Workflow Tools (New!)
| Tool | Description |
|---|---|
run_workflow |
Execute a sequence of tool actions with optional delays |
demo_terminal_workflow |
Complete demo: open terminal, type command, execute, screenshot, close |
Screen Tools
| Tool | Description |
|---|---|
screenshot |
Capture current screen (returns base64 PNG) |
get_screen_size |
Get screen dimensions |
analyze_screen |
Use local VLM to analyze screen content (requires Ollama) |
Mouse Tools
| Tool | Description |
|---|---|
click |
Click at coordinates |
double_click |
Double-click at coordinates |
move_to |
Move mouse cursor |
drag_to |
Drag from current position |
scroll |
Scroll mouse wheel |
get_mouse_position |
Get current cursor location |
Keyboard Tools
| Tool | Description |
|---|---|
type_text |
Type a string of text (supports fast paste mode via clipboard) |
press_key |
Press a single key |
hotkey |
Press key combination |
Fast Text Input with Paste
The type_text tool supports fast clipboard-based paste mode for significantly faster text input:
{
"tool": "type_text",
"args": {
"text": "long command or text here",
"use_paste": true
}
}
Benefits:
- 10x faster than character-by-character typing for long text
- Ideal for: Pasting long commands, scripts, credentials
- How it works: Copies text to clipboard, then uses Ctrl+V (Linux/Windows) or Cmd+V (macOS) to paste
- Default:
use_paste=false(uses character-by-character typing)
When to use:
- ✓ Large blocks of text
- ✓ Complex commands with special characters
- ✓ When speed matters more than realtime character visibility
- ✗ Games that don't support paste input
- ✗ When character-by-character input is explicitly required
File Tools
| Tool | Description |
|---|---|
read_file |
Read file contents |
write_file |
Write content to file |
list_files |
List directory contents |
upload_file |
Upload file to PC |
download_file |
Download file from PC |
System Tools
| Tool | Description |
|---|---|
execute_command |
Run shell command |
get_system_info |
Get CPU/RAM/GPU usage |
list_windows |
List open windows |
focus_window |
Bring window to foreground |
Workflow Automation
run_workflow - Composite Command Execution
Execute multiple tools in sequence with a single command. Perfect for complex automation tasks.
Example: Open terminal and run command
{
"steps": [
{
"tool": "execute_command",
"args": {"command": "gnome-terminal"},
"wait_ms": 1500,
"description": "Open terminal"
},
{
"tool": "type_text",
"args": {"text": "ls -la"},
"wait_ms": 200,
"description": "Type command"
},
{
"tool": "press_key",
"args": {"key": "enter"},
"wait_ms": 1000,
"description": "Execute command"
},
{
"tool": "screenshot",
"args": {},
"description": "Capture result"
}
]
}
Step Fields:
tool(required): Name of the tool to executeargs(optional): Arguments to pass to the toolwait_ms(optional): Milliseconds to wait after this stepdescription(optional): Human-readable step descriptioncontinue_on_error(optional): Continue workflow if this step fails
Returns:
{
"success": true,
"total_steps": 4,
"completed_steps": 4,
"failed_step": null,
"results": [...],
"total_time_ms": 3523,
"error": null
}
demo_terminal_workflow - Ready-Made Terminal Demo
A convenience tool that demonstrates the full automation capability in one call.
Usage:
{
"text": "echo hello world",
"terminal_wait_ms": 2000,
"post_type_wait_ms": 500,
"post_enter_wait_ms": 1000,
"capture_screenshot": true,
"close_terminal": true
}
What it does:
- Auto-detects platform terminal (gnome-terminal, konsole, xterm, Terminal.app, cmd)
- Opens the terminal application
- Waits for terminal to fully load
- Types the provided command
- Presses Enter to execute
- Waits for command output
- Captures screenshot for verification (optional)
- Closes terminal with platform-appropriate hotkey (optional)
Returns:
{
"success": true,
"terminal_command": "gnome-terminal",
"platform": "Linux",
"text_typed": "echo hello world",
"screenshot": {"success": true, "image": "...", ...},
"steps_completed": ["detect_terminal", "open_terminal", "wait_for_terminal",
"type_text", "press_enter", "capture_screenshot",
"close_terminal"],
"total_time_ms": 4523,
"error": null
}
Platform Support:
- Linux: gnome-terminal, konsole, xfce4-terminal, mate-terminal, tilix, terminator, xterm
- macOS: Terminal.app
- Windows: cmd.exe
Configuration
Create ~/.gaming-agent/config.json:
{
"server": {
"host": "0.0.0.0",
"port": 8765,
"password": "your-secure-password"
},
"vlm": {
"enabled": false,
"provider": "ollama",
"model": "qwen2.5-vl:3b",
"endpoint": "http://localhost:11434"
},
"security": {
"allowed_paths": ["/home/user/games", "C:\\Games"],
"blocked_commands": ["rm -rf", "format", "del /f"],
"max_command_timeout": 30
}
}
Enabling VLM (Optional)
To use the analyze_screen tool with local vision models:
-
Install Ollama (if not already installed):
curl -fsSL https://ollama.com/install.sh | sh -
Pull a vision model:
ollama pull qwen2.5-vl:3b # Lightweight, fast # or ollama pull moondream # Alternative -
Install VLM dependencies:
pip install ai-gaming-agent[vlm] -
Enable in config (
~/.gaming-agent/config.json):{ "vlm": { "enabled": true, "provider": "ollama", "model": "qwen2.5-vl:3b", "endpoint": "http://localhost:11434" } } -
Use in workflows:
{ "tool": "analyze_screen", "args": { "prompt": "What is the current health percentage?" } }
Deployment Options
Option A: Claude Does All Vision (Simplest)
- Claude analyzes all screenshots
- Gaming PCs are simple executors
- No GPU needed on gaming PCs
- Use HTTP/SSE transport with Bearer auth
Option B: Hybrid with Local VLM (Recommended)
- Claude for high-level decisions and orchestration
- Local VLM (Qwen2.5-VL, Moondream) for fast visual processing
- Best balance of speed and intelligence
- Reduces API costs and latency
Option C: Full Local (Privacy)
- Local orchestrator (e.g., Qwen3-72B via Ollama)
- No cloud APIs, complete privacy
- Requires powerful hardware (GPU recommended)
- Use stdio transport for local control
Practical Examples
Example 1: Terminal Automation
# Ask Claude: "Run the demo_terminal_workflow with the command 'uname -a'"
# Result: Opens terminal, runs command, captures output, closes
Example 2: Multi-Step Workflow
# Ask Claude: "Create a workflow that:
# 1. Opens a file browser
# 2. Navigates to Downloads
# 3. Takes a screenshot
# 4. Closes the window"
# Claude will use run_workflow with execute_command, type_text, screenshot, hotkey
Example 3: Game Automation with VLM
# Ask Claude: "Use analyze_screen to check if the game menu is visible,
# then click the 'Start Game' button at coordinates you detect"
# Claude will:
# 1. Call analyze_screen with prompt "Is there a Start Game button? Where?"
# 2. Use VLM response to determine coordinates
# 3. Call click tool with detected coordinates
Example 4: Batch File Operations
# Ask Claude: "Create a workflow that backs up all .save files from
# C:\Games\MyGame to C:\Backups\saves-{date}"
# Claude will use run_workflow with list_files, read_file, write_file
Security
- Always use strong, unique passwords (min 16 chars, random)
- Limit file access to game directories only in
allowed_paths - Use a VPN if accessing over internet (never expose to public)
- Enable TLS/HTTPS for production (use reverse proxy like nginx)
- Regularly rotate passwords (weekly for high-security environments)
- Monitor
~/.gaming-agent/audit.logfor suspicious activity - Set appropriate
max_command_timeoutto prevent runaway processes
Development
# Install with uv (recommended)
uv sync --extra dev
# Or with pip
pip install -e ".[dev]"
# Lint
uv run ruff check src tests
# Run unit tests
uv run pytest tests/ --ignore=tests/integration -v
Testing
Unit Tests (CI)
Unit tests run in CI on every push/PR. They test configuration, file operations, and tool interfaces without requiring a display.
# Run unit tests only
uv run pytest tests/ --ignore=tests/integration -v
Integration Tests (Local Only)
Integration tests perform real GUI automation and require:
- A real display (X11, Wayland, Windows, macOS)
tesseract-ocrfor OCR verification- pyautogui working with your display
These tests CANNOT run in CI because they need a real desktop environment.
# Install integration test dependencies
uv sync --extra integration
# Install tesseract (Linux)
sudo apt install tesseract-ocr
# Install tesseract (macOS)
brew install tesseract
# Run integration tests locally
uv run pytest tests/integration -v
What Integration Tests Do
| Test | Description |
|---|---|
test_screenshot_returns_image |
Captures real screen content |
test_ocr_screen_content |
Uses OCR to read text from screen |
test_mouse_move |
Moves mouse cursor to position |
test_mouse_click |
Performs real mouse click |
test_type_text |
Types actual text |
test_terminal_workflow |
Opens terminal, types command, closes |
test_terminal_with_ocr_verification |
Opens terminal, runs command, verifies output with OCR |
test_batch_gui_operations |
Runs 7 GUI operations in sequence |
Terminal Workflow Test
The most comprehensive test opens a terminal, types a command, and verifies the output:
# What the test does:
1. Opens system terminal (gnome-terminal, konsole, xterm, etc.)
2. Types: echo "AGENT_TEST_abc12345"
3. Presses Enter
4. Takes screenshot
5. Runs OCR on screenshot
6. Verifies "AGENT_TEST_abc12345" appears in OCR output
7. Closes terminal with Alt+F4
CI Configuration
CI runs on GitHub Actions with Python 3.12 only:
# .github/workflows/ci.yml
- Checkout code
- Install uv
- Install Python 3.12
- Install system deps (xvfb, scrot, python3-tk)
- Run ruff lint
- Run unit tests (integration tests excluded)
Integration tests are auto-skipped in CI via the CI=true environment variable.
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.