MCP Servers

Helm

Semantic browser automation MCP server that lets AI agents control a browser using natural language, handling navigation, form filling, data extraction, and more without CSS selectors.

README

Helm

Semantic browser automation MCP server. Tell it what to do in plain English — it figures out the selectors.

Helm gives AI agents a full browser through 24 tools: navigate pages, fill forms, click buttons, extract structured data, capture network traffic, and profile performance. No CSS selectors or XPaths required.

Quick Start

bun install
bun run src/server.ts

Connect to Claude Code

claude mcp add --transport stdio helm -- bun run /path/to/helm/src/server.ts

Connect to Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "helm": {
      "command": "bun",
      "args": ["run", "/path/to/helm/src/server.ts"]
    }
  }
}

Connect to any MCP client

Helm uses stdio transport. Point your client at:

command: bun
args: ["run", "/path/to/helm/src/server.ts"]

Tools

Navigation

Tool	Description
`nav_goto`	Navigate to a URL and wait for the page to be ready
`nav_back`	Navigate back in browser history
`nav_forward`	Navigate forward in browser history
`nav_reload`	Reload the current page

Observation

Tool	Description
`obs_observe`	Get a filtered, task-relevant snapshot of interactive elements on the page
`obs_screenshot`	Take a screenshot, optionally with numbered Set-of-Mark overlays
`obs_extract`	Extract a specific piece of information by natural language description

Interaction

Tool	Description
`act_click`	Click an element by its visible label or Set-of-Mark ID
`act_fill`	Fill a single input field by its label
`act_fill_form`	Fill multiple form fields at once
`act_select`	Select a dropdown option by the dropdown's label and option text
`act_press`	Press a keyboard key or shortcut

Composite

Tool	Description
`act_login`	Complete a full login flow in one call
`act_submit_form`	Find and click the primary submit button
`page_wait_for`	Wait until a condition is true on the page

Session

Tool	Description
`page_new_tab`	Open a new browser tab
`page_close_tab`	Close a browser tab
`page_switch_tab`	Switch to a different tab by ID
`page_get_cookies`	Get cookies for the current page or domain
`page_set_cookie`	Set a cookie for a domain

Data

Tool	Description
`data_query`	Run a SQL-like query against the page DOM
`data_analyze_page`	Auto-detect repeating data patterns and infer a schema
`data_extract`	Extract structured data matching a caller-defined schema

DevTools (CDP)

Tool	Description
`cdp_evaluate`	Evaluate JavaScript via Chrome DevTools Protocol
`cdp_performance`	Snapshot browser performance metrics (DOM nodes, heap, layout count)
`cdp_network_start`	Start capturing network requests
`cdp_network_stop`	Stop capture and return requests with optional URL filter

Examples

Navigate and extract structured data:

Navigate to https://books.toscrape.com and extract the book titles, prices,
and availability from the listing page.

Helm auto-detects the repeating pattern, maps your requested fields to DOM elements, and returns typed data (prices as floats, not strings).

Document a website for scraper development:

Navigate to the court case search page. Dump all forms and their fields.
Search for "Smith", capture network traffic during the search, then extract
the results into a structured table. Write the full spec to a markdown file.

Profile a web app:

Start network capture, navigate to localhost:3000, log in, then stop capture.
Show me all API calls made during login. Also grab performance metrics —
I want to know DOM node count and JS heap usage.

Fill forms by label, not selector:

Fill the form: {"Email": "test@example.com", "Password": "secret123"}
and click "Sign In".

Architecture

src/
  server.ts              MCP server entrypoint (stdio)
  types.ts               Shared type definitions
  core/
    browser.ts           Playwright browser/tab management
    resolver.ts          Label -> element resolution (role, text, fuzzy, memory)
    observer.ts          Page observation and element filtering
    som.ts               Set-of-Mark screenshot annotation
    memory.ts            SQLite site memory (bun:sqlite)
    fingerprint.ts       DOM fingerprinting for stale selector detection
    recovery.ts          Auto-retry with backoff, overlay dismissal
    schemasniff.ts       Automatic DOM pattern detection
    extractor.ts         Structured data extraction engine
    domql.ts             SQL-like DOM query engine
    cdp.ts               Chrome DevTools Protocol wrapper
  tools/
    navigation.ts        nav_goto, nav_back, nav_forward, nav_reload
    observation.ts       obs_observe, obs_screenshot, obs_extract
    interaction.ts       act_click, act_fill, act_fill_form, act_select, act_press
    composite.ts         act_login, act_submit_form, page_wait_for
    session.ts           page_new_tab, page_close_tab, page_switch_tab, cookies
    data.ts              data_query, data_analyze_page, data_extract
    devtools.ts          cdp_evaluate, cdp_performance, cdp_network_start/stop

Key design decisions

Semantic resolution. Tools take human-readable labels ("Sign In", "Email"), not CSS selectors. The resolver tries getByRole, getByLabel, getByText, fuzzy matching, and site memory — in parallel.
Site memory. Successful actions are recorded in SQLite keyed by domain. On revisit, known selectors are tried first. DOM fingerprinting detects when cached selectors are stale.
DOM fingerprinting. Each resolved element gets a hash of its tag, role, text, attributes, parent, and siblings. If the hash changes, the cached selector is discarded and re-resolved.
Set-of-Mark fallback. For sites with poor ARIA, obs_screenshot(overlay=true) annotates every interactive element with a number. Then act_click(mark_id=7) clicks element 7 by coordinates.
Structured extraction. data_extract takes a field schema (name, description, type), auto-detects repeating containers via sniffPage, maps fields by token overlap + type compatibility, and returns typed data.
CDP layer. Direct Chrome DevTools Protocol access for network capture, performance profiling, and raw JS eval — things that are awkward through Playwright's abstraction.
Error recovery. Automatic retry with exponential backoff. Cookie banners and modal overlays are dismissed between retries.
Token efficiency. obs_observe returns only task-relevant elements, not the full accessibility tree. Extraction results are capped at 15KB.

Helm vs Playwright MCP

Playwright MCP is Microsoft's official MCP server for Playwright. Both give AI agents a browser — here's why they exist and when to pick each.

Philosophy

Playwright MCP exposes Playwright's API almost directly. Tools like browser_click(selector) and browser_type(selector, text) require the caller to figure out the right CSS selector or ref attribute. It's a thin wrapper — powerful if you already know the page structure.

Helm is semantic-first. You say act_click("Sign In") or act_fill("Email", "test@example.com") and the resolver figures out the selector through role matching, label association, text search, fuzzy matching, and site memory. The agent never needs to inspect the DOM.

Feature comparison

Capability	Helm	Playwright MCP
Element targeting	By visible label, auto-resolved	By CSS selector or `ref` attribute
Structured extraction	`data_extract` with field schema + auto-detection	Manual — read page, write selectors yourself
DOM query language	SQL-like `data_query` against the page	Not included
Pattern detection	`data_analyze_page` finds repeating structures	Not included
Site memory	SQLite — remembers working selectors per domain	None
DOM fingerprinting	Detects stale cached selectors automatically	N/A
CDP access	`cdp_evaluate`, `cdp_performance`, network capture	Not exposed
Set-of-Mark	Screenshot overlay with numbered elements	Snapshot with `ref` attributes
Error recovery	Auto-retry, cookie/modal dismissal between retries	Basic error messages
Login flows	`act_login` handles navigate + fill + submit + wait	Manual multi-step
Screenshot	PNG with optional SoM overlay	PNG
Multi-tab	Yes — open, close, switch tabs	Yes
Headless	No — runs headed for visual debugging	Headless by default
Browser engine	Chromium (via Playwright)	Chromium (via Playwright)

When to use Playwright MCP

You want a minimal, official tool with stable API surface
Your agent is good at constructing CSS selectors from page snapshots
You need headless operation in CI/CD
You're already building on Playwright and want consistent abstractions

When to use Helm

Your agent should describe what to interact with, not how to find it
You're extracting structured data from pages (products, listings, records, tables)
You need network capture or performance profiling alongside automation
You want the server to learn from past visits and get faster over time
You're building scrapers and need the site documented automatically

Can I use both?

Yes. They're independent MCP servers. Some teams use Playwright MCP for simple navigation and Helm for extraction and form-heavy workflows.

Development

bun test                 # Run tests
bunx tsc --noEmit        # Typecheck
bun run --watch src/server.ts  # Dev mode with auto-reload

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured