MCP Servers

agent-eyes

MCP server that wraps Playwright to give AI agents eyes on the web, enabling browser search, navigation, extraction, and interaction with intelligent LLM-based DOM extraction and skill caching.

README

Agent Eye

Give AI agents eyes on the web. Agent Eye is an MCP server that wraps Playwright for local browser search, navigation, extraction, and interaction. It features intelligent LLM-based DOM extraction with multi-pass reconciliation, consensus voting, DOM caching, and field-specific confidence thresholds.

What Is Implemented

Browser navigation with persistent or incognito modes
Search, interact, screenshot, content extraction, JS evaluation tools
5-Phase LLM-based DOM extraction (chunking, reconciliation, consensus voting, caching, field thresholds)
AI workflow orchestration tool powered by Ollama
Session persistence with cookie save/load per profile
Domain allowlist/denylist checks
Memory pressure protection
.env support via dotenv bootstrap at startup
Pre-commit git hooks (Husky) for test validation
AGPL-3.0 license (free for non-commercial/personal use)

Requirements

Node >= 22.14.0
Playwright Chromium installed

Setup

Install dependencies

npm install

Install Chromium

npx playwright install chromium

Configure environment

cp .env.example .env

Run

Development:

npm run dev

Build + start:

npm run build
npm run start

Type check:

npm run typecheck

Tests:

npm run test

Environment Variables

Core:

AGENT_EYE_HEADLESS=true|false
AGENT_EYE_NAV_TIMEOUT_MS=30000
AGENT_EYE_MAX_CHARS=50000
AGENT_EYE_PROFILE_ROOT=~/.agent-eye/profiles
AGENT_EYE_MODEL=deep-seek|llama2|default (content-budget profile for browser_get_content, not Ollama model name)
AGENT_EYE_SKILL_STALE_FOCUSED_CHARS=400 (refresh skill if cached focused content is too small)
AGENT_EYE_SKILL_STALE_SELECTOR_MATCHES=1 (refresh skill if cached selectors barely match DOM)
AGENT_EYE_SKILL_STALE_FAILURE_THRESHOLD=2 (refresh skill after repeated selector failures)

Ollama:

OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=gemma4:e4b-it-q4_K_M

Recommended with your local model:

Keep AGENT_EYE_MODEL=default
Set OLLAMA_MODEL to your installed model tag (for example gemma4:e4b-it-q4_K_M)

Note: OLLAMA_MODEL from .env is loaded automatically because src/index.ts imports dotenv/config before server startup.

MCP Tools

browser_navigate
browser_get_content
browser_search
browser_interact
browser_screenshot (with intelligent SPA hydration waits)
browser_evaluate_js
browser_analyze_page (with skill cache + stale-skill auto-refresh)
browser_extract_section (token-efficient targeted extraction)
browser_ocr_chunk (visual fallback when selector extraction is weak)
ai_orchestrate_workflow

AI Orchestration

ai_orchestrate_workflow accepts a high-level goal and lets the orchestrator decide next actions step-by-step. Built-in extraction uses 5-phase LLM system:

Phase 1: Chunked per-field extraction with confidence scoring
Phase 2: Global reconciliation pass for conflict resolution
Phase 3: Consensus voting across multiple runs for reliability
Phase 4: DOM fingerprint caching to skip redundant extraction
Phase 5: Field-specific confidence thresholds for domain-aware validation

See EXTRACTION_PHASES.md for detailed architecture.

Input:

goal (required)
context (optional)
maxSteps (optional)
model (optional; defaults to OLLAMA_MODEL then gemma4:e4b-it-q4_K_M)

Output:

success
steps_executed
history of chosen actions
result object
error (when present)

Use cases and integration guidance:

AI_ORCHESTRATION_USE_CASES.md

Session Persistence

Persistent profiles are stored under AGENT_EYE_PROFILE_ROOT.

Cookies are loaded when a persistent profile starts
Cookies are saved when session manager closes
Profile name allows separate session histories per workflow/site

Avoiding Bot Detection & Human Verification

Some websites employ aggressive bot detection (such as CAPTCHAs, Cloudflare walls, or login requirements). To prevent your agent from being blocked:

Warm the session: Run the warming script to manually open a non-headless browser window:
```
npm run warm-session
```
Solve challenges manually: Interact with the page (e.g. solve the CAPTCHA, log in, or accept cookies) in the opened browser window. This saves the verified session cookies to your persistent profile.
Run your agent: The agent will automatically reuse these saved cookies on subsequent runs to bypass the verification check. Note that cookies and sessions expire over time, so you may need to run npm run warm-session again if the agent starts encountering bot blocks.

Skill Management

Agent Eye features a project-local skill caching system that maps CSS selectors to page structures to avoid redundant LLM page-analysis queries.

1. How Skills Are Learned & Saved

When the agent visits a page and calls the browser_analyze_page tool, the system checks if a matching skill pattern already exists.
If no skill exists, the tool splits the page into chunks, summarizes them using the local LLM, and builds a structural PageMap (identifying main content vs noise selectors).
This PageMap is automatically saved as a JSON skill file under the local ./agent-skills/<domain>.json directory.
Consecutive page loads on the same domain increments the successCount of the skill.
Each pattern also tracks success/failure metadata so stale selectors can be detected and refreshed automatically.

2. Matching and URL Routing

Skill matching is entirely URL-driven (not prompt-driven).
When a page is analyzed, the system extracts the domain and matches specific path patterns using the first two directory segments (e.g., vnexpress.net/the-thao).
If no exact path-level pattern matches the current URL, the system falls back to the top-level domain pattern (vnexpress.net).

3. Managing Skills via CLI

You can inspect, refresh, or remove learned domain skills using the Skill CLI:

# List all learned domain skills
npx tsx examples/skill-cli.ts list

# View the full JSON pattern for a specific domain
npx tsx examples/skill-cli.ts show vnexpress.net

# Delete a skill (forces the agent to re-analyze page structure on next visit)
npx tsx examples/skill-cli.ts delete vnexpress.net

Alternatively, you can force the agent to overwrite an existing skill dynamically by calling browser_analyze_page with the "forceReanalyze": true argument.

4. Stale Skill Auto-Refresh

When a cached pattern returns very low focused content and poor selector matches, the system treats the skill as stale.
The stale pattern is marked with a failure increment, then browser_analyze_page immediately falls back to fresh analysis in the same call.
For general-agent loops, repeated low-quality browser_extract_section results trigger a deterministic escalation:
1. Force one browser_analyze_page refresh
2. If still failing, escalate to browser_ocr_chunk

Security & Quality

Code Quality:

Pre-commit git hooks (Husky) validate all tests before commit
Enable with npm run prepare
Hook lives at .husky/pre-commit and runs npm test

License:

AGPL-3.0: Free for non-commercial and personal use
Commercial use requires explicit permission

Extraction Safety:

LLM confidence scoring prevents low-reliability extractions
Field-specific thresholds apply per-field validation
Evidence tracking for debugging extraction failures
browser_evaluate_js restricted mode blocks dangerous API patterns unless explicitly unsafe
Optional domain security blocks navigation to disallowed domains
Memory guard blocks new navigations when critical threshold is reached

Integration Tests

Relevant integration coverage:

tests/integration/browser-tools.test.ts
tests/integration/huntrix-golden-e2e.test.ts
tests/integration/ai-orchestrator-live.test.ts
tests/integration/ai-orchestrate-mcp-client.test.ts
tests/integration/youtube-search.test.ts

The integration tests validate:

ai-orchestrator-live: High-level prompt execution with JSON extraction (requires Ollama; opt in via RUN_LIVE_AI_ORCHESTRATOR_TEST=true)
browser-tools: Screenshot waits for dynamic content (YouTube SPA hydration)
dom extraction: 5-phase system (chunking, reconciliation, consensus, caching, thresholds)

Client Integration & Production Simulation

For general browser automation goals, you should drive the browser step-by-step from your own orchestration service (as ai_orchestrate_workflow is optimized specifically for YouTube metadata extraction).

1. General Agent Loop Client Script

We provide a production-ready simulation client script in examples/general-agent-client.ts that demonstrates how to:

Connect to the agent-eyes MCP server over stdio transport.
Query a local Ollama model (e.g. gemma4:e4b-it-q4_K_M) using simplified HTML context and action history.
Run actions step-by-step using MCP tools.

2. Running Simulations

You can evaluate the system on any prompt goal using:

# Run with default goal (VNExpress news summary)
npx tsx examples/general-agent-client.ts

# Run with a custom prompt
npx tsx examples/general-agent-client.ts "search what is the best trend in github.com"

3. Capturing Debug Screenshots

For easier debugging, you can enable step-by-step browser screenshots by appending the --screenshot CLI option. This will capture viewports at each step using the browser_screenshot tool and write them to a local debug-screenshots/ directory:

npx tsx examples/general-agent-client.ts "search what is the current stock for VCB (HOSE vn)" --screenshot

3.1 Example: CNN News Summary (10 items)

Example command:

npx tsx examples/general-agent-client.ts "go to cnn.com and summary for me list 10 of news" --screenshot

Expected behavior:

Agent navigates to https://edition.cnn.com/
Runs browser_analyze_page and extracts structured items[] with source URLs/images
Returns exactly 10 news summaries with source links

🎉 Goal Reached!

Based on the visible content of CNN.com from June 28, 2026, here are 10 news summaries visible on CNN.com:

1. Germany's new Nazi party databases are challenging decades-held sanitized family narratives
  Source: https://edition.cnn.com/world/germans-nazi-past-far-right-intl

2. Israel's military and tech industry race to counter Hezbollah's latest threat
  Source: https://edition.cnn.com/world/middleeast/israel-tech-hezbollah-drone-threat-intl

3. The US and Iran have a deal on paper. At sea, the Strait of Hormuz is 'chaotic'
  Source: https://edition.cnn.com/world/iran-us-hormuz-agreement-mou-intl

4. Live updates: Death toll climbs to over 1,400 in Venezuela quakes
  Source: https://edition.cnn.com/world/live-news/venezuela-earthquake-hnk

3.2 Example: YouTube Video Metadata (likes, views, release date)

Example command:

npx tsx examples/general-agent-client.ts "search Golden Official, to see that video has how many like, view and release date" --screenshot

Expected behavior:

Agent runs YouTube-focused search via browser_search with site="youtube"
Navigates to the selected watch URL
Uses browser_analyze_page on the watch page to extract metadata
Returns video title, views, release date, and likes (if visible)

Sample output excerpt:

🎉 Goal Reached!

Based on the provided YouTube page content for "Golden" Official Lyric Video | KPop Demon Hunters | Sony Animation:

- Video Title: “Golden” Official Lyric Video | KPop Demon Hunters | Sony Animation
- Views: 1,531,178,811 views
- Release Date: Jun 23, 2025
- Likes: The explicit number of likes is not visible in the provided focused content.

3.3 Example: OpenAI vs Claude News + Benchmark Table

Example command:

npx tsx examples/general-agent-client.ts "list me 5 news about AI models from OpenAI and Claude, which one is better, I need some benkmark or even the table for easy compare features" --screenshot

Expected behavior:

Agent starts from Google search and gathers relevant comparison sources
Navigates into a detailed article and runs browser_analyze_page
Returns a structured comparison with benchmark values and an easy-to-read table
Includes a practical recommendation on “which one is better” by use case

Sample output excerpt:

🎉 Goal Reached!

Comparative Overview (Benchmarks Table)

| Category | GPT-5.4 (OpenAI) | Claude Opus 4.6 (Anthropic) | Gemini 3.1 Pro | Grok 4 |
| :--- | :--- | :--- | :--- | :--- |
| Coding | Strong (74.9% SWE-bench) | Strong (74%+; powers Cursor) | Good (63.8%, 1M context) | Leader (75%) |
| Reasoning | 92.8% GPQA | 91.3% GPQA | Leader (94.3% GPQA) | Competitive |
| Writing | Good (Canvas editor) | Leader (128K output, natural prose) | Good (Docs integration) | Uncensored style |
| API Price (in/out per 1M) | $2.50/$15 | $15/$75 (Opus), $3/$15 (Sonnet) | $2/$12 | $2/$15 |

Summary on which is better:
- Claude Opus 4.6 is stronger for long-form writing and natural prose.
- OpenAI GPT-5.4 remains a strong all-rounder with broad ecosystem support.

3.4 Example: Company Information Discovery (Alphaus Cloud)

Example command:

npx tsx examples/general-agent-client.ts "help me to find information about Alphaus Cloud company" --screenshot

Expected behavior:

Agent starts with browser_search on Google to gather initial company sources
Navigates to the official website (https://www.alphaus.cloud/en)
Runs browser_analyze_page and refreshes stale skill selectors when needed
Returns a structured company profile: mission, credentials, products, and services

Sample output excerpt:

🎉 Goal Reached!

Based on the Alphaus Cloud landing page, here is detailed information about the company:

Company overview and mission:
- Alphaus Cloud focuses on cloud cost intelligence and FinOps-driven optimization.

Key credentials:
- Claims No. 1 position in Japan with over $200M+ cloud resources managed annually.
- Preferred FinOps solution by 25% of AWS Premier Partners in Japan.
- Reports 3000+ monthly active users on Ripple.

Core products and services:
- Octo: cost visibility and optimization for cloud users.
- Ripple: billing automation and reseller-focused cloud cost management.
- Professional Services: FinOps implementation and cross-team enablement.

3.5 Example: Localized Product + Price Discovery (Lynk & Co 08 in HCMC)

Example command:

npx tsx examples/general-agent-client.ts "find information about Lynk & co 08 in Vietnam? and current price for that car in Ho Chi Minh?" --screenshot

Expected behavior:

Agent runs localized search with Vietnamese keywords via browser_search
Uses focused content to extract product profile and local on-road pricing
Returns model details and city-specific price estimates for Ho Chi Minh City

Sample output excerpt:

🎉 Goal Reached!

General information on Lynk & Co 08:
- D-segment SUV, PHEV (1.5 Turbo + electric motor)
- EV range up to 200 km, total range over 1,400 km

Current price in Ho Chi Minh City (HCMC):
- Lynk & Co 08 Pro:
  - Factory list price: 1,299,000,000 VND
  - Estimated on-road price (HCMC): 1.43-1.45 billion VND
- Lynk & Co 08 Halo:
  - Factory list price: 1,389,000,000 VND
  - Estimated on-road price (HCMC): 1.53-1.55 billion VND

4. Integration Code Pattern

To call this MCP server from your own Node.js service, use the following code pattern:

import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';

const transport = new StdioClientTransport({
  command: 'npx',
  args: ['tsx', 'src/index.ts'],
  cwd: '/path/to/agent-eyes',
});

const client = new Client({ name: 'my-orchestrator', version: '1.0.0' }, { capabilities: {} });

await client.connect(transport);

// 1. Navigate
await client.callTool({
  name: 'browser_navigate',
  arguments: { url: 'https://vnexpress.net' },
});

// 2. Get simplified page content for LLM context
const contentResult = await client.callTool({
  name: 'browser_get_content',
  arguments: { format: 'simplified-html', includeInteractiveMap: true },
});
const textContent = (contentResult as any).content?.find((c: any) => c.type === 'text')?.text;
const payload = JSON.parse(textContent);
console.log(payload.content); // Simplified HTML content
console.log(payload.interactiveMap); // Actionable ID mapping

// 3. Interact (e.g. click element with data-agent-id="34")
await client.callTool({
  name: 'browser_interact',
  arguments: { action: 'click', target: '34' },
});

// 4. Capture screenshot for debugging
const screenshotResult = await client.callTool({
  name: 'browser_screenshot',
  arguments: { fullPage: false },
});
const imageItem = (screenshotResult as any).content?.find((c: any) => c.type === 'image');
if (imageItem?.data) {
  // imageItem.data contains raw base64 PNG data
  fs.writeFileSync('debug-step.png', Buffer.from(imageItem.data, 'base64'));
}

await client.close();

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured