MCP Servers

Screen Agent

A Windows desktop automation MCP server that enables UI recognition through OCR, UIA controls, and multi-point color matching. It allows agents to interact with desktop applications via actions like clicking and typing while using a learning system to track and improve operation success.

README

Screen Agent

Windows 桌面自动化 MCP 服务器，支持 OCR、UIA 控件和多点颜色匹配的 UI 识别。

功能特性

多种 UI 识别方式
- OCR 文字识别（RapidOCR）
- Windows UIA 控件识别
- 多点颜色特征匹配
智能操作
- 窗口绑定与自动聚焦
- 弹窗检测与处理
- 操作验证与错误恢复
学习与进化
- 技能学习系统
- 操作成功率追踪
- 向量数据库存储经验

安装

环境要求

Windows 10/11
Python 3.12+
Ollama（可选，用于视觉识别）

安装步骤

# 克隆仓库
git clone https://github.com/lqszhsp/screen-agent.git
cd screen-agent

# 创建虚拟环境
python -m venv venv
venv\Scripts\activate

# 安装依赖
pip install -r requirements.txt

配置

复制配置模板：

copy config\settings.example.py config\settings.py

编辑 config/settings.py 设置 API 密钥（如需使用云端视觉 API）

使用方法

作为 MCP 服务器

在 Claude Desktop 或其他 MCP 客户端中配置：

{
  "mcpServers": {
    "screen-agent": {
      "command": "python",
      "args": ["C:\\path\\to\\screen_agent\\mcp_server.py"]
    }
  }
}

可用工具

工具	说明
`screen_get_layout`	绑定窗口，获取布局信息
`screen_click`	点击屏幕元素
`screen_input_text`	输入文字
`screen_scroll`	滚动屏幕
`screen_hotkey`	按下快捷键
`screen_capture`	截图并识别元素
`screen_wait`	等待指定时间
`screen_explore`	自动探索界面
`screen_detect_ui`	检测 UI 元素位置
`screen_scan_ui_elements`	扫描并生成图标特征
`screen_ask_user_locate`	请求用户帮助定位
`screen_learn_success`	记录成功操作
`screen_query_knowledge`	查询已学习知识

点击模式

# OCR 模式（默认）- 通过文字定位
screen_click(target="设置", mode="ocr")

# UIA 模式 - 通过控件定位
screen_click(target="确定", mode="ui", control_type="Button")

# 多点颜色模式 - 通过颜色特征定位
screen_click(mode="multipoint", features={"0|0": "#07c160", "10|10": "#ffffff"})

项目结构

screen_agent/
├── mcp_server.py          # MCP 服务器入口
├── actions/               # 操作模块
│   ├── click.py          # 点击操作
│   ├── input_text.py     # 文字输入
│   ├── scroll.py         # 滚动操作
│   └── ...
├── core/                  # 核心模块
│   ├── perception.py     # OCR 感知
│   ├── window_manager.py # 窗口管理
│   ├── evolution.py      # 进化机制
│   └── ...
├── app_layouts/           # 程序布局文件
│   ├── _guidelines.md    # 操作手册
│   ├── _template.md      # 布局模板
│   ├── 微信.md           # 微信布局
│   └── ...
└── config/               # 配置文件
    └── settings.py

布局文件

每个程序可以有专属的布局文件（app_layouts/{程序名}.md），包含：

窗口结构和区域定义
常用元素位置
操作规范和限制
快捷键列表

参考 app_layouts/_template.md 创建新的布局文件。

技术文档

操作手册 - Agent 操作指南
UI 扫描技能 - 扫描 UI 元素流程
技术参考 - 代码实现细节

许可证

MIT License

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured