Screen Agent

Screen Agent

A Windows desktop automation MCP server that enables UI recognition through OCR, UIA controls, and multi-point color matching. It allows agents to interact with desktop applications via actions like clicking and typing while using a learning system to track and improve operation success.

Category
Visit Server

README

Screen Agent

Windows 桌面自动化 MCP 服务器,支持 OCR、UIA 控件和多点颜色匹配的 UI 识别。

功能特性

  • 多种 UI 识别方式

    • OCR 文字识别(RapidOCR)
    • Windows UIA 控件识别
    • 多点颜色特征匹配
  • 智能操作

    • 窗口绑定与自动聚焦
    • 弹窗检测与处理
    • 操作验证与错误恢复
  • 学习与进化

    • 技能学习系统
    • 操作成功率追踪
    • 向量数据库存储经验

安装

环境要求

  • Windows 10/11
  • Python 3.12+
  • Ollama(可选,用于视觉识别)

安装步骤

# 克隆仓库
git clone https://github.com/lqszhsp/screen-agent.git
cd screen-agent

# 创建虚拟环境
python -m venv venv
venv\Scripts\activate

# 安装依赖
pip install -r requirements.txt

配置

  1. 复制配置模板:
copy config\settings.example.py config\settings.py
  1. 编辑 config/settings.py 设置 API 密钥(如需使用云端视觉 API)

使用方法

作为 MCP 服务器

在 Claude Desktop 或其他 MCP 客户端中配置:

{
  "mcpServers": {
    "screen-agent": {
      "command": "python",
      "args": ["C:\\path\\to\\screen_agent\\mcp_server.py"]
    }
  }
}

可用工具

工具 说明
screen_get_layout 绑定窗口,获取布局信息
screen_click 点击屏幕元素
screen_input_text 输入文字
screen_scroll 滚动屏幕
screen_hotkey 按下快捷键
screen_capture 截图并识别元素
screen_wait 等待指定时间
screen_explore 自动探索界面
screen_detect_ui 检测 UI 元素位置
screen_scan_ui_elements 扫描并生成图标特征
screen_ask_user_locate 请求用户帮助定位
screen_learn_success 记录成功操作
screen_query_knowledge 查询已学习知识

点击模式

# OCR 模式(默认)- 通过文字定位
screen_click(target="设置", mode="ocr")

# UIA 模式 - 通过控件定位
screen_click(target="确定", mode="ui", control_type="Button")

# 多点颜色模式 - 通过颜色特征定位
screen_click(mode="multipoint", features={"0|0": "#07c160", "10|10": "#ffffff"})

项目结构

screen_agent/
├── mcp_server.py          # MCP 服务器入口
├── actions/               # 操作模块
│   ├── click.py          # 点击操作
│   ├── input_text.py     # 文字输入
│   ├── scroll.py         # 滚动操作
│   └── ...
├── core/                  # 核心模块
│   ├── perception.py     # OCR 感知
│   ├── window_manager.py # 窗口管理
│   ├── evolution.py      # 进化机制
│   └── ...
├── app_layouts/           # 程序布局文件
│   ├── _guidelines.md    # 操作手册
│   ├── _template.md      # 布局模板
│   ├── 微信.md           # 微信布局
│   └── ...
└── config/               # 配置文件
    └── settings.py

布局文件

每个程序可以有专属的布局文件(app_layouts/{程序名}.md),包含:

  • 窗口结构和区域定义
  • 常用元素位置
  • 操作规范和限制
  • 快捷键列表

参考 app_layouts/_template.md 创建新的布局文件。

技术文档

许可证

MIT License

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured