surf-mcp
Enables visual browser automation through natural language descriptions, allowing AI to click, type, and navigate web pages by seeing the page.
README

Surf MCP
MCP server for visual browser automation via Fara.
Overview
Surf provides browser automation through visual grounding - you describe what you see, and it clicks, types, and navigates based on that description. No CSS selectors, no DOM traversal, just natural language.
The core insight: an AI that can see the page doesn't need to parse HTML.
Features
- Visual grounding: Click/type by natural language description ("the blue Submit button")
- Direct Fara execution: Fara decides the action, we execute it
- Autonomous mode: Multi-step goal completion with progress tracking
- Multi-server LM Studio: Auto-discovery and failover across GPU servers
- Session persistence: Storage state (cookies, localStorage) round-trips through tool calls
- Security controls: Domain allowlists, rate limiting, audit logging
Security
surf-mcp is designed for LOCAL use only via stdio transport.
Not Suitable For
- Multi-tenant environments - trust boundary is the machine
- Untrusted networks without SSH tunneling
- Compliance-sensitive contexts - no formal security audit
- Untrusted MCP clients - surf-mcp trusts its client completely
Residual Risks
- No encryption at MCP protocol level
- LLM responses (Fara/Gemini) executed without verification
- Browser automation can click/type anything visible
Remote Execution
Use SSH as the transport - surf-mcp sees normal stdio:
{
"mcpServers": {
"surf-remote": {
"command": "ssh",
"args": ["-i", "~/.ssh/key", "user@gpu-box", "surf-mcp"]
}
}
}
See SECURITY.md for the full threat model and security controls.
How It Works
Surf uses Fara-7B (Microsoft's agentic vision model) to understand web pages:
sequenceDiagram
participant Client as MCP Client
participant Surf as surf-mcp
participant PW as Playwright
participant Fara as Fara-7B
Client->>Surf: act("click the search button")
activate Surf
Surf->>PW: screenshot()
PW-->>Surf: PNG image
Surf->>Fara: analyze(image, goal)
Note right of Fara: Visual grounding
Fara-->>Surf: FaraToolCall{left_click, [624,280]}
Surf->>PW: click(624, 280)
PW-->>Surf: done
deactivate Surf
Surf-->>Client: Result + new screenshot
Supported Actions
| Action | Description |
|---|---|
left_click |
Click at coordinates |
double_click |
Double-click at coordinates |
type |
Type text (optionally at coordinates) |
scroll |
Scroll page up/down |
key |
Press keyboard keys |
visit_url |
Navigate to URL |
terminate |
Task complete signal (agent mode) |
wait |
Wait for page to load |
Installation
# Install from source
pip install -e .
# Install Playwright browsers
playwright install chromium
# Optional: Install harness dependencies
pip install -e ".[harness]"
Quick Start
As MCP Server
Add to your MCP client configuration:
{
"mcpServers": {
"surf": {
"command": "surf-mcp"
}
}
}
Docker
# Recommended: use docker compose (reads .env automatically)
cp .env.example .env
# Edit .env with your settings
docker compose up
# Or build and run directly (note: --env-file doesn't strip quotes)
docker build -t surf-mcp .
docker run -it --rm \
--add-host=host.docker.internal:host-gateway \
-e LMSTUDIO_SERVERS=default=http://host.docker.internal:1234/v1 \
surf-mcp
Fara Test Harness
Interactive UI for testing visual grounding:
cd tools/fara-harness
./run.sh # Linux/Mac
run.bat # Windows
See tools/fara-harness/CHEATSHEET.md for command reference.
Usage Examples
Browser Navigation with Visual Grounding
# Create session
session = await mcp.call("session_create", {
"drivers": {
"web": {
"type": "browser",
"headless": False,
"storage_state": saved_state # Optional: restore cookies
}
}
})
# Navigate to page
await mcp.call("goto", {
"session_id": session["session_id"],
"driver": "web",
"location": "https://example.com"
})
# Click element by description
await mcp.call("click", {
"session_id": session["session_id"],
"driver": "web",
"description": "the blue Submit button"
})
# Direct Fara execution (recommended)
await mcp.call("act", {
"session_id": session["session_id"],
"driver": "web",
"goal": "type 'hello world' into the search box"
})
# Autonomous multi-step execution
await mcp.call("act_autonomous", {
"session_id": session["session_id"],
"driver": "web",
"goal": "log in with username 'demo' and password 'demo123'"
})
# Destroy session and capture storage_state
result = await mcp.call("session_destroy", {"session_id": session["session_id"]})
saved_state = result["summary"]["web"]["storage_state"]
Configuration
Environment Variables
# Multi-server LM Studio (visual grounding)
LMSTUDIO_SERVERS="rtx3090=http://localhost:1234/v1,rtx8000=http://192.168.1.100:1234/v1"
FARA_MODEL_IDS="microsoft_fara-7b,fara-7b-gguf,gao-zijian/fara-7b"
FARA_MAX_FAILURES=2
FARA_PROBE_TIMEOUT=2.0
# Confidence and Agent Mode
FARA_MIN_CONFIDENCE=0.7
FARA_CONFIDENCE_RETRIES=2
FARA_MAX_AGENT_STEPS=20
# Alternative: Single OpenAI-compatible endpoint
OPENAI_API_KEY=lm-studio
OPENAI_BASE_URL=http://localhost:1234/v1
SURF_LLM_MODEL=microsoft_fara-7b
# Alternative: Gemini
GOOGLE_API_KEY=...
SURF_LLM_PROVIDER=gemini
SURF_LLM_MODEL=gemini-2.0-flash
# Browser defaults
SURF_BROWSER_HEADLESS=true
SURF_BROWSER_VIEWPORT_WIDTH=1920
SURF_BROWSER_VIEWPORT_HEIGHT=1080
# Session management
SURF_MAX_SESSIONS=10
SURF_SESSION_TIMEOUT_SECONDS=3600
Multi-Server LM Studio
Surf supports multiple LM Studio instances for redundancy:
LMSTUDIO_SERVERS="gpu1=http://localhost:1234/v1,gpu2=http://192.168.1.50:1234/v1"
Behavior:
- Auto-discovery: Probes each server's
/v1/modelsto find loaded Fara model - Prefer loaded: Prioritizes servers with Fara already in VRAM
- Failover: Automatically retries on another server if one fails
MCP Tools
Session Lifecycle
| Tool | Description |
|---|---|
session_create |
Create browser session |
session_destroy |
Cleanup session, returns storage_state |
session_list |
List active sessions |
Navigation
| Tool | Description |
|---|---|
goto |
Navigate to URL |
current |
Get current URL |
back / forward |
Navigate history |
history |
Get navigation history |
Content
| Tool | Description |
|---|---|
list |
Extract page links |
read |
Read page content |
snapshot |
Capture screenshot |
Visual Grounding
| Tool | Description |
|---|---|
locate |
Find element by description, return coordinates |
click |
Click element by description |
type |
Type into element by description |
scroll |
Scroll page up/down |
wait |
Wait for element or delay |
act |
Direct Fara execution - Fara decides the action |
act_autonomous |
Multi-step autonomous execution until task complete |
Architecture
See docs/ARCHITECTURE.md for detailed architecture documentation.
Design decisions are recorded in docs/adr/.
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest # All tests
pytest -m "not live" # Skip LLM tests (for CI)
pytest -m live # Only live LLM tests
# Type checking
mypy src/
# Linting
ruff check src/
License
MIT
© 2025 Shane V Cantwell | reflectiveattention.ai
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.