mcp-helm

mcp-helm

Attaches to your existing Chrome browser to allow Claude to see the active tab, interact via accessibility tree, and detect handoff triggers like 2FA and captchas.

Category
Visit Server

README

mcp-helm

Drive your real Chrome session from Claude — copilot mode, with handoff awareness.

Most browser-automation MCP servers spawn a fresh Playwright Chromium. That's useless when the work is "log into Stripe and click 5 things" — you don't have your cookies, your 2FA, your bookmarks. mcp-helm attaches to YOUR Chrome, the one you're already signed into, and lets Claude run a small set of tools against the active tab.

It also knows when to step back: when the page shows a 2FA prompt, captcha, payment confirmation, or biometric request, the screenshot tool flags it and Claude can call handoff() to wait for you.

Why this exists

The eyes-and-hands problem: Claude tells you "click Settings → API access" and you click Settings and there is no API access, so you screenshot back to Claude, which guesses again. That's 5 minutes of round-trips for a 5-second task, and it happens on every Stripe / Apple / Play Console / Cloudflare / Vercel setup.

mcp-helm cuts that loop. Claude sees the actual page, picks elements from the accessibility tree (no coordinate guessing), and stops when it would do something it shouldn't.

Install

npm install -g mcp-helm

Add to ~/.claude.json (or your MCP client's config):

{
  "mcpServers": {
    "helm": {
      "command": "mcp-helm"
    }
  }
}

Usage

1. Launch a driveable Chrome

Add this alias to your shell rc:

alias chrome-pilot='open -a "Google Chrome" --args --remote-debugging-port=9222 --user-data-dir=$HOME/.chrome-pilot'

Run it once: chrome-pilot. A separate Chrome profile opens. Sign into everything you'd want Claude to drive (Play Console, Stripe, etc.). Cookies persist across launches — you only sign in once per service.

Why a separate profile? Your main Chrome can't be launched in remote-debugging mode while it's already running. The dedicated profile lives in ~/.chrome-pilot and stays separate from your daily browsing.

2. From Claude

You: Upload the AAB at <path> to Play Store internal testing.
Claude: [calls helm.attach] → [helm.navigate to play.google.com/console]
        [helm.screenshot] → sees the dashboard
        [helm.click "Personalized AI Portfolio Bot"]
        ... etc

If a 2FA prompt appears, screenshot returns handoffTriggers: ["2FA prompt"] and Claude calls handoff to wait.

Tools

Tool Purpose
attach Connect to Chrome on port 9222. Always call first.
list_tabs List all open tabs.
focus_tab Switch active tab by index or URL substring.
screenshot PNG + URL + title + handoff triggers detected.
inspect Numbered list of interactive elements (a11y tree).
click Click by id (from inspect), text, or CSS selector. Returns changed: bool from screenshot diff.
type Type into a field. submit: true presses Enter after.
navigate Go to a URL.
wait_for Wait for text or selector.
handoff Pause and ask the human to take over.

Design choices

  • Accessibility tree, not coordinates. Vision-grounded clicking (Anthropic computer use) is great but flaky on Retina displays and high-DPR scaling. The a11y tree gives stable, semantic IDs — and is what screen readers use.
  • Screenshot diff after every click. If changed: false, the click was a no-op. Saves Claude from cheerfully reporting success.
  • Handoff detection is regex-based, not LLM-based. Cheap, fast, no false positives on common login phrases.
  • No tab-management heuristics. attach picks the first non-blank tab; use list_tabs + focus_tab to be precise. Predictable beats clever.

Status

v0.1 — works for simple flows (Play Console, Stripe dashboard, Vercel, Cloudflare). Edge cases this doesn't handle yet:

  • Shadow DOM components (some web-component-heavy sites)
  • iframes (need to surface frame switching)
  • File uploads from disk
  • Keyboard shortcuts beyond Enter

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured