mcprune

mcprune

MCP middleware that prunes Playwright accessibility snapshots for LLM agents, reducing tokens by 75-95% while preserving all references, enabling agents to interact with web pages efficiently.

Category
Visit Server

README

mcprune

MCP middleware that prunes Playwright accessibility snapshots for LLM agents. Zero ML, 75-95% token reduction, all refs preserved.

The problem

Playwright MCP gives LLM agents browser control via accessibility snapshots — YAML trees of every element on the page. But real pages produce 100K-400K+ tokens per snapshot. That's too large for any LLM context window to handle effectively.

mcprune sits between the agent and Playwright MCP, intercepting every response and pruning snapshots down to only what the agent needs: interactive elements, prices, headings, and refs to click.

Agent  ←→  mcprune (proxy)  ←→  Playwright MCP  ←→  Browser
              ↓
         prune() + summarize()
         75-95% token reduction

Before / After

Amazon search page — raw Playwright snapshot: ~100,000 tokens. Includes every pixel-level wrapper, tracking URL, sidebar filter, energy label, legal footer, and duplicated link.

After mcprune: ~14,000 tokens. Product titles, prices, ratings, color options, "Add to basket" buttons, and clickable refs. Everything an agent needs to shop.

Amazon product page: ~28,000 tokens → ~3,300 tokens (88% reduction). Full buy flow preserved.

Quick start

As an MCP server (recommended)

Add to your Claude Code, Cursor, or any MCP client config:

{
  "mcpServers": {
    "browser": {
      "command": "node",
      "args": ["/path/to/mcprune/mcp-server.js"]
    }
  }
}

That's it. The agent gets all Playwright browser tools (browser_navigate, browser_click, browser_type, browser_snapshot, etc.) with automatic pruning on every response.

Options:

  • --headless — run browser without visible window
  • --mode auto|act|browse|navigate|full — pruning mode (default: auto, which picks act or browse per page from the URL and snapshot content)

As a library

import { prune, summarize } from 'mcprune';

const snapshot = await page.locator('body').ariaSnapshot();

const pruned = prune(snapshot, {
  mode: 'act',
  context: 'iPhone 15 price'  // optional: keywords for relevance filtering
});

const summary = summarize(snapshot);
// → "Apple iPhone 15 (128GB) - Black | pick color(5), set quantity, add to basket, buy now, 91 links"

How it works

A 9-step rule-based pipeline. No ML, no embeddings, no API calls.

Step What Why
1. Extract regions Keep landmarks matching the mode (actmain only) Drop banner, footer, sidebar in action mode
2. Prune nodes Drop paragraphs, images, descriptions. Keep interactive elements, prices, short labels Core reduction — 50-60% happens here
3. Collapse wrappers generic > generic > button "Buy"button "Buy" Playwright trees are deeply nested
4. Clean up Trim combobox options, drop orphaned headings A 50-option dropdown → just the combobox name
5. Dedup links One link per unique text per product card Amazon cards have 3+ links to the same product
6. Drop noise Energy labels, product sheets, ad feedback, "view options" These repeat 10-30x per search page
7. Truncate footer Everything after "back to top" is noise Corporate links, legal text, subsidiaries
8. Drop filters Sidebar refinement panels 20+ collapsible filter groups on Amazon
9. Serialize Back to YAML, strip URLs, clean tracking params URLs were 62% of output — agents click by ref

Context-aware pruning

When the agent types a search query, mcprune captures it as context. Product cards that don't match any keywords are collapsed to just their title, while matching products keep full details.

Agent types "iPhone 15" in search box
  → mcprune captures context: ["iphone", "15"]
  → Matching cards: full price, rating, colors, buttons
  → Non-matching cards: title only

Pruning modes

The mode controls only how mcprune prunes the snapshot. Playwright MCP executes all browser actions identically regardless of mode.

Mode Regions kept Pipeline Use case
auto (default) per detection picks act or browse per page Mixed browsing — let mcprune choose
act main only All 9 steps Shopping, forms, taking actions
browse main only Steps 1-4 + 9 (skip e-commerce noise removal) Docs, articles, reading content
navigate main + banner + nav + search All 9 steps Site exploration
full All landmarks All 9 steps Debugging, full page view

Browse mode preserves paragraphs, code blocks, term/definition pairs, inline links, all headings, and figure captions — content that act mode drops because agents taking actions don't need article text.

Performance

Tested live via MCP proxy:

Page Raw Pruned Reduction
Amazon NL search (30 products) ~100K tokens ~14K tokens 85.8%
Amazon NL product page ~28K tokens ~3.3K tokens 88.0%
Wikipedia article (browse) ~54K tokens ~8.6K tokens 84.0%
MDN docs (browse) ~10K tokens ~5.5K tokens ~43%
Python docs (browse) ~22K tokens ~17K tokens ~23%
Amazon product (fixture) ~1.2K tokens ~289 tokens 76.5%

All refs ([ref=eN]) are preserved. The agent can click, type, and interact with every element in the pruned output.

Install

git clone https://github.com/hamr0/mcprune.git
cd mcprune
npm install
npx playwright install chromium

Test

npm test  # 148 tests

Project structure

mcprune/
  mcp-server.js       MCP proxy — entry point, spawns Playwright MCP
  src/
    prune.js           9-step pruning pipeline + summarize(), mode-aware filtering
    parse.js           Playwright ariaSnapshot YAML → tree
    serialize.js       Tree → YAML, URL cleaning
    roles.js           ARIA role taxonomy (LANDMARKS, INTERACTIVE, STRUCTURAL, ...)
    proxy-utils.js     Extracted proxy logic (snapshot detection, context, stats)
  test/
    parse.test.js      8 parser tests
    prune.test.js      12 prune + summarize tests
    proxy.test.js      51 proxy utility + auto-detection tests
    edge-cases.test.js 77 edge case + browse mode + regression tests
    fixtures/          9 real-world page snapshots (e-commerce, docs, forums, gov)
  scripts/             Dev tools for capturing live snapshots
  blueprint.md         Detailed technical documentation
  docs/                Structured project documentation

How the MCP proxy works

  1. Spawns @playwright/mcp as a child process over stdio
  2. Forwards all JSON-RPC messages bidirectionally
  3. Tracks context from browser_type text and browser_navigate URL params
  4. Intercepts all tool responses (not just browser_snapshot — Playwright embeds snapshots in browser_click, browser_type, etc.)
  5. Detects snapshots via regex, runs prune() + summarize()
  6. Prepends a stats header: [mcprune: 85.8% reduction, ~100K → ~14K tokens | page summary]

Robustness & security

mcprune processes whatever the open web hands back through Playwright, so the pipeline is built to fail safe:

  • Zero runtime dependencies beyond @playwright/mcpnpm audit is clean.
  • Bounded tree depth — pathological/malicious nesting can't crash the pruner (depth is capped; no refs are lost).
  • Fail-open proxy — if a snapshot can't be parsed or pruned, the original response is forwarded unchanged rather than dropped, so the agent never wedges.
  • Injection-safe stats header — page-derived text (titles, labels) is sanitized so it can't break out of the [mcprune: …] frame.
  • Domain-anchored mode detection — look-alike hosts (e.g. wikipedia.org.attacker.net) can't spoof a pruning mode.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured