agent-eyes
MCP server that wraps Playwright to give AI agents eyes on the web, enabling browser search, navigation, extraction, and interaction with intelligent LLM-based DOM extraction and skill caching.
README
Agent Eye
Give AI agents eyes on the web. Agent Eye is an MCP server that wraps Playwright for local browser search, navigation, extraction, and interaction. It features intelligent LLM-based DOM extraction with multi-pass reconciliation, consensus voting, DOM caching, and field-specific confidence thresholds.
What Is Implemented
- Browser navigation with persistent or incognito modes
- Search, interact, screenshot, content extraction, JS evaluation tools
- 5-Phase LLM-based DOM extraction (chunking, reconciliation, consensus voting, caching, field thresholds)
- AI workflow orchestration tool powered by Ollama
- Session persistence with cookie save/load per profile
- Domain allowlist/denylist checks
- Memory pressure protection
- .env support via dotenv bootstrap at startup
- Pre-commit git hooks (Husky) for test validation
- AGPL-3.0 license (free for non-commercial/personal use)
Requirements
- Node >= 22.14.0
- Playwright Chromium installed
Setup
- Install dependencies
npm install
- Install Chromium
npx playwright install chromium
- Configure environment
cp .env.example .env
Run
Development:
npm run dev
Build + start:
npm run build
npm run start
Type check:
npm run typecheck
Tests:
npm run test
Environment Variables
Core:
- AGENT_EYE_HEADLESS=true|false
- AGENT_EYE_NAV_TIMEOUT_MS=30000
- AGENT_EYE_MAX_CHARS=50000
- AGENT_EYE_PROFILE_ROOT=~/.agent-eye/profiles
- AGENT_EYE_MODEL=deep-seek|llama2|default (content-budget profile for browser_get_content, not Ollama model name)
- AGENT_EYE_SKILL_STALE_FOCUSED_CHARS=400 (refresh skill if cached focused content is too small)
- AGENT_EYE_SKILL_STALE_SELECTOR_MATCHES=1 (refresh skill if cached selectors barely match DOM)
- AGENT_EYE_SKILL_STALE_FAILURE_THRESHOLD=2 (refresh skill after repeated selector failures)
Ollama:
- OLLAMA_BASE_URL=http://localhost:11434
- OLLAMA_MODEL=gemma4:e4b-it-q4_K_M
Recommended with your local model:
- Keep AGENT_EYE_MODEL=default
- Set OLLAMA_MODEL to your installed model tag (for example gemma4:e4b-it-q4_K_M)
Note: OLLAMA_MODEL from .env is loaded automatically because src/index.ts imports dotenv/config before server startup.
MCP Tools
- browser_navigate
- browser_get_content
- browser_search
- browser_interact
- browser_screenshot (with intelligent SPA hydration waits)
- browser_evaluate_js
- browser_analyze_page (with skill cache + stale-skill auto-refresh)
- browser_extract_section (token-efficient targeted extraction)
- browser_ocr_chunk (visual fallback when selector extraction is weak)
- ai_orchestrate_workflow
AI Orchestration
ai_orchestrate_workflow accepts a high-level goal and lets the orchestrator decide next actions step-by-step. Built-in extraction uses 5-phase LLM system:
- Phase 1: Chunked per-field extraction with confidence scoring
- Phase 2: Global reconciliation pass for conflict resolution
- Phase 3: Consensus voting across multiple runs for reliability
- Phase 4: DOM fingerprint caching to skip redundant extraction
- Phase 5: Field-specific confidence thresholds for domain-aware validation
See EXTRACTION_PHASES.md for detailed architecture.
Input:
- goal (required)
- context (optional)
- maxSteps (optional)
- model (optional; defaults to OLLAMA_MODEL then gemma4:e4b-it-q4_K_M)
Output:
- success
- steps_executed
- history of chosen actions
- result object
- error (when present)
Use cases and integration guidance:
Session Persistence
Persistent profiles are stored under AGENT_EYE_PROFILE_ROOT.
- Cookies are loaded when a persistent profile starts
- Cookies are saved when session manager closes
- Profile name allows separate session histories per workflow/site
Avoiding Bot Detection & Human Verification
Some websites employ aggressive bot detection (such as CAPTCHAs, Cloudflare walls, or login requirements). To prevent your agent from being blocked:
- Warm the session: Run the warming script to manually open a non-headless browser window:
npm run warm-session - Solve challenges manually: Interact with the page (e.g. solve the CAPTCHA, log in, or accept cookies) in the opened browser window. This saves the verified session cookies to your persistent profile.
- Run your agent: The agent will automatically reuse these saved cookies on subsequent runs to bypass the verification check. Note that cookies and sessions expire over time, so you may need to run
npm run warm-sessionagain if the agent starts encountering bot blocks.
Skill Management
Agent Eye features a project-local skill caching system that maps CSS selectors to page structures to avoid redundant LLM page-analysis queries.
1. How Skills Are Learned & Saved
- When the agent visits a page and calls the
browser_analyze_pagetool, the system checks if a matching skill pattern already exists. - If no skill exists, the tool splits the page into chunks, summarizes them using the local LLM, and builds a structural
PageMap(identifying main content vs noise selectors). - This
PageMapis automatically saved as a JSON skill file under the local./agent-skills/<domain>.jsondirectory. - Consecutive page loads on the same domain increments the
successCountof the skill. - Each pattern also tracks success/failure metadata so stale selectors can be detected and refreshed automatically.
2. Matching and URL Routing
- Skill matching is entirely URL-driven (not prompt-driven).
- When a page is analyzed, the system extracts the domain and matches specific path patterns using the first two directory segments (e.g.,
vnexpress.net/the-thao). - If no exact path-level pattern matches the current URL, the system falls back to the top-level domain pattern (
vnexpress.net).
3. Managing Skills via CLI
You can inspect, refresh, or remove learned domain skills using the Skill CLI:
# List all learned domain skills
npx tsx examples/skill-cli.ts list
# View the full JSON pattern for a specific domain
npx tsx examples/skill-cli.ts show vnexpress.net
# Delete a skill (forces the agent to re-analyze page structure on next visit)
npx tsx examples/skill-cli.ts delete vnexpress.net
Alternatively, you can force the agent to overwrite an existing skill dynamically by calling browser_analyze_page with the "forceReanalyze": true argument.
4. Stale Skill Auto-Refresh
- When a cached pattern returns very low focused content and poor selector matches, the system treats the skill as stale.
- The stale pattern is marked with a failure increment, then
browser_analyze_pageimmediately falls back to fresh analysis in the same call. - For general-agent loops, repeated low-quality
browser_extract_sectionresults trigger a deterministic escalation:- Force one
browser_analyze_pagerefresh - If still failing, escalate to
browser_ocr_chunk
- Force one
Security & Quality
Code Quality:
- Pre-commit git hooks (Husky) validate all tests before commit
- Enable with
npm run prepare - Hook lives at
.husky/pre-commitand runsnpm test
License:
- AGPL-3.0: Free for non-commercial and personal use
- Commercial use requires explicit permission
Extraction Safety:
- LLM confidence scoring prevents low-reliability extractions
- Field-specific thresholds apply per-field validation
- Evidence tracking for debugging extraction failures
- browser_evaluate_js restricted mode blocks dangerous API patterns unless explicitly unsafe
- Optional domain security blocks navigation to disallowed domains
- Memory guard blocks new navigations when critical threshold is reached
Integration Tests
Relevant integration coverage:
- tests/integration/browser-tools.test.ts
- tests/integration/huntrix-golden-e2e.test.ts
- tests/integration/ai-orchestrator-live.test.ts
- tests/integration/ai-orchestrate-mcp-client.test.ts
- tests/integration/youtube-search.test.ts
The integration tests validate:
- ai-orchestrator-live: High-level prompt execution with JSON extraction (requires Ollama; opt in via
RUN_LIVE_AI_ORCHESTRATOR_TEST=true) - browser-tools: Screenshot waits for dynamic content (YouTube SPA hydration)
- dom extraction: 5-phase system (chunking, reconciliation, consensus, caching, thresholds)
Client Integration & Production Simulation
For general browser automation goals, you should drive the browser step-by-step from your own orchestration service (as ai_orchestrate_workflow is optimized specifically for YouTube metadata extraction).
1. General Agent Loop Client Script
We provide a production-ready simulation client script in examples/general-agent-client.ts that demonstrates how to:
- Connect to the
agent-eyesMCP server over stdio transport. - Query a local Ollama model (e.g.
gemma4:e4b-it-q4_K_M) using simplified HTML context and action history. - Run actions step-by-step using MCP tools.
2. Running Simulations
You can evaluate the system on any prompt goal using:
# Run with default goal (VNExpress news summary)
npx tsx examples/general-agent-client.ts
# Run with a custom prompt
npx tsx examples/general-agent-client.ts "search what is the best trend in github.com"
3. Capturing Debug Screenshots
For easier debugging, you can enable step-by-step browser screenshots by appending the --screenshot CLI option. This will capture viewports at each step using the browser_screenshot tool and write them to a local debug-screenshots/ directory:
npx tsx examples/general-agent-client.ts "search what is the current stock for VCB (HOSE vn)" --screenshot
3.1 Example: CNN News Summary (10 items)
Example command:
npx tsx examples/general-agent-client.ts "go to cnn.com and summary for me list 10 of news" --screenshot
Expected behavior:
- Agent navigates to
https://edition.cnn.com/ - Runs
browser_analyze_pageand extracts structureditems[]with source URLs/images - Returns exactly 10 news summaries with source links
🎉 Goal Reached!
Based on the visible content of CNN.com from June 28, 2026, here are 10 news summaries visible on CNN.com:
1. Germany's new Nazi party databases are challenging decades-held sanitized family narratives
Source: https://edition.cnn.com/world/germans-nazi-past-far-right-intl
2. Israel's military and tech industry race to counter Hezbollah's latest threat
Source: https://edition.cnn.com/world/middleeast/israel-tech-hezbollah-drone-threat-intl
3. The US and Iran have a deal on paper. At sea, the Strait of Hormuz is 'chaotic'
Source: https://edition.cnn.com/world/iran-us-hormuz-agreement-mou-intl
4. Live updates: Death toll climbs to over 1,400 in Venezuela quakes
Source: https://edition.cnn.com/world/live-news/venezuela-earthquake-hnk
3.2 Example: YouTube Video Metadata (likes, views, release date)
Example command:
npx tsx examples/general-agent-client.ts "search Golden Official, to see that video has how many like, view and release date" --screenshot
Expected behavior:
- Agent runs YouTube-focused search via
browser_searchwithsite="youtube" - Navigates to the selected watch URL
- Uses
browser_analyze_pageon the watch page to extract metadata - Returns video title, views, release date, and likes (if visible)
Sample output excerpt:
🎉 Goal Reached!
Based on the provided YouTube page content for "Golden" Official Lyric Video | KPop Demon Hunters | Sony Animation:
- Video Title: “Golden” Official Lyric Video | KPop Demon Hunters | Sony Animation
- Views: 1,531,178,811 views
- Release Date: Jun 23, 2025
- Likes: The explicit number of likes is not visible in the provided focused content.
3.3 Example: OpenAI vs Claude News + Benchmark Table
Example command:
npx tsx examples/general-agent-client.ts "list me 5 news about AI models from OpenAI and Claude, which one is better, I need some benkmark or even the table for easy compare features" --screenshot
Expected behavior:
- Agent starts from Google search and gathers relevant comparison sources
- Navigates into a detailed article and runs
browser_analyze_page - Returns a structured comparison with benchmark values and an easy-to-read table
- Includes a practical recommendation on “which one is better” by use case
Sample output excerpt:
🎉 Goal Reached!
Comparative Overview (Benchmarks Table)
| Category | GPT-5.4 (OpenAI) | Claude Opus 4.6 (Anthropic) | Gemini 3.1 Pro | Grok 4 |
| :--- | :--- | :--- | :--- | :--- |
| Coding | Strong (74.9% SWE-bench) | Strong (74%+; powers Cursor) | Good (63.8%, 1M context) | Leader (75%) |
| Reasoning | 92.8% GPQA | 91.3% GPQA | Leader (94.3% GPQA) | Competitive |
| Writing | Good (Canvas editor) | Leader (128K output, natural prose) | Good (Docs integration) | Uncensored style |
| API Price (in/out per 1M) | $2.50/$15 | $15/$75 (Opus), $3/$15 (Sonnet) | $2/$12 | $2/$15 |
Summary on which is better:
- Claude Opus 4.6 is stronger for long-form writing and natural prose.
- OpenAI GPT-5.4 remains a strong all-rounder with broad ecosystem support.
3.4 Example: Company Information Discovery (Alphaus Cloud)
Example command:
npx tsx examples/general-agent-client.ts "help me to find information about Alphaus Cloud company" --screenshot
Expected behavior:
- Agent starts with
browser_searchon Google to gather initial company sources - Navigates to the official website (
https://www.alphaus.cloud/en) - Runs
browser_analyze_pageand refreshes stale skill selectors when needed - Returns a structured company profile: mission, credentials, products, and services
Sample output excerpt:
🎉 Goal Reached!
Based on the Alphaus Cloud landing page, here is detailed information about the company:
Company overview and mission:
- Alphaus Cloud focuses on cloud cost intelligence and FinOps-driven optimization.
Key credentials:
- Claims No. 1 position in Japan with over $200M+ cloud resources managed annually.
- Preferred FinOps solution by 25% of AWS Premier Partners in Japan.
- Reports 3000+ monthly active users on Ripple.
Core products and services:
- Octo: cost visibility and optimization for cloud users.
- Ripple: billing automation and reseller-focused cloud cost management.
- Professional Services: FinOps implementation and cross-team enablement.
3.5 Example: Localized Product + Price Discovery (Lynk & Co 08 in HCMC)
Example command:
npx tsx examples/general-agent-client.ts "find information about Lynk & co 08 in Vietnam? and current price for that car in Ho Chi Minh?" --screenshot
Expected behavior:
- Agent runs localized search with Vietnamese keywords via
browser_search - Uses focused content to extract product profile and local on-road pricing
- Returns model details and city-specific price estimates for Ho Chi Minh City
Sample output excerpt:
🎉 Goal Reached!
General information on Lynk & Co 08:
- D-segment SUV, PHEV (1.5 Turbo + electric motor)
- EV range up to 200 km, total range over 1,400 km
Current price in Ho Chi Minh City (HCMC):
- Lynk & Co 08 Pro:
- Factory list price: 1,299,000,000 VND
- Estimated on-road price (HCMC): 1.43-1.45 billion VND
- Lynk & Co 08 Halo:
- Factory list price: 1,389,000,000 VND
- Estimated on-road price (HCMC): 1.53-1.55 billion VND
4. Integration Code Pattern
To call this MCP server from your own Node.js service, use the following code pattern:
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StdioClientTransport } from '@modelcontextprotocol/sdk/client/stdio.js';
const transport = new StdioClientTransport({
command: 'npx',
args: ['tsx', 'src/index.ts'],
cwd: '/path/to/agent-eyes',
});
const client = new Client({ name: 'my-orchestrator', version: '1.0.0' }, { capabilities: {} });
await client.connect(transport);
// 1. Navigate
await client.callTool({
name: 'browser_navigate',
arguments: { url: 'https://vnexpress.net' },
});
// 2. Get simplified page content for LLM context
const contentResult = await client.callTool({
name: 'browser_get_content',
arguments: { format: 'simplified-html', includeInteractiveMap: true },
});
const textContent = (contentResult as any).content?.find((c: any) => c.type === 'text')?.text;
const payload = JSON.parse(textContent);
console.log(payload.content); // Simplified HTML content
console.log(payload.interactiveMap); // Actionable ID mapping
// 3. Interact (e.g. click element with data-agent-id="34")
await client.callTool({
name: 'browser_interact',
arguments: { action: 'click', target: '34' },
});
// 4. Capture screenshot for debugging
const screenshotResult = await client.callTool({
name: 'browser_screenshot',
arguments: { fullPage: false },
});
const imageItem = (screenshotResult as any).content?.find((c: any) => c.type === 'image');
if (imageItem?.data) {
// imageItem.data contains raw base64 PNG data
fs.writeFileSync('debug-step.png', Buffer.from(imageItem.data, 'base64'));
}
await client.close();
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.