playwright-autopilot

playwright-autopilot

Playwright E2E test runner with deep debugging— action capture, DOM snapshots, network analysis, error pattern matching, behavioral bisection, and interactive browser exploration.

Category
Visit Server

README

Playwright Autopilot

A Claude Code plugin that debugs and fixes Playwright E2E tests autonomously. It runs your tests with full action capture — DOM snapshots, network requests, console output, screenshots — then investigates failures like a senior QA engineer and ships the fix.

https://github.com/user-attachments/assets/26f734a5-d05e-41c9-bc3f-2b58561c2ce0

Quick Start

# Add the marketplace
/plugin marketplace add kaizen-yutani/playwright-autopilot

# Install the plugin
/plugin install kaizen-yutani/playwright-autopilot

Then ask Claude to fix a failing test or triage your whole suite:

/playwright-autopilot:fix-e2e tests/checkout.spec.ts
/playwright-autopilot:triage-e2e e2e

Or just describe what you need — Claude will use the MCP tools automatically:

Fix all failing e2e tests in the "e2e" project

What It Does

Every browser action during a test run is captured with:

  • Before/after DOM snapshots — aria tree of the page before and after each click, fill, navigation
  • Network requests — URL, method, status, timing, request/response bodies
  • Console output — errors, warnings, logs tied to the action that produced them
  • Screenshots — captured at point of failure

When a test fails, Claude doesn't guess — it reads the actual page state, checks for failed API calls, and traces the root cause through the action timeline.

How It Works

1. Capture Hook

A lightweight CJS hook (captureHook.cjs) is injected via NODE_OPTIONS --require into Playwright's test worker processes. It monkey-patches BrowserContext._initialize to add an instrumentation listener that captures every browser action with full context. No modifications to Playwright's source code required — works with any Playwright installation.

2. MCP Tools

The plugin exposes 37 tools via the Model Context Protocol that Claude calls on-demand. This is token-efficient by design — instead of dumping entire traces into context, Claude pulls only what it needs:

Test Execution & Debugging:

Tool Purpose
e2e_list_projects List Playwright projects from config
e2e_list_tests Discover test files and cases
e2e_run_test Run tests with action capture, flaky detection (retries, repeatEach)
e2e_get_failure_report Error + DOM + network + console summary
e2e_get_evidence_bundle All failure evidence in one call — ready for Jira
e2e_generate_report Self-contained HTML or JSON report file
e2e_suggest_tests Test coverage gap analysis
e2e_get_actions Step-by-step action timeline
e2e_get_action_detail Deep dive into a single action
e2e_get_dom_snapshot Aria tree before/after an action
e2e_get_dom_diff What changed in the DOM
e2e_get_network Network requests with filtering
e2e_get_console Console output with filtering
e2e_get_screenshot Failure screenshot as image
e2e_get_test_source Test file with failing line highlighted
e2e_find_elements Search DOM for specific elements
e2e_scan_page_objects Index all page objects and methods
e2e_get_app_flows Read stored application flows
e2e_save_app_flow Save a verified user journey
e2e_get_context Flows + page object index in one call
e2e_discover_flows Auto-scan specs for draft flow map
e2e_build_flows Auto-run uncovered tests and save their flows
e2e_get_stats Suite health dashboard: pass rate trends, flaky scores, category breakdowns
e2e_save_triage_run Save a categorized triage run for trend tracking
e2e_get_triage_config Read triage settings (Jira config, flaky threshold)

Interactive Browser Exploration:

Tool Purpose
browser_navigate Open a URL (launches browser automatically)
browser_navigate_back Go back in browser history
browser_snapshot Capture ARIA accessibility tree with [ref=X] markers
browser_click Click an element by ref
browser_type Type into an input field, optionally submit
browser_fill_form Fill multiple form fields in one call
browser_select_option Select a dropdown option
browser_press_key Press a key (Enter, Escape, Tab, etc.)
browser_hover Hover over an element
browser_take_screenshot Capture a PNG screenshot
browser_set_headers Set custom HTTP headers (same-origin only for CORS safety)
browser_close Close the browser

The browser_* tools launch a real Chrome instance and let Claude explore your application interactively — navigate pages, click elements, fill forms, and observe page state through ARIA snapshots. Each interaction returns timing, network requests, DOM changes, and an updated snapshot. Use this to understand an app before writing tests, debug UI issues visually, or verify fixes.

3. Flow Memory

After fixing (or verifying) a test, the plugin saves the confirmed application flow — the sequence of user interactions that make up the happy path. These flows persist in .e2e-flows.json and accumulate across sessions.

Next time that test breaks, Claude already knows the intended user journey and jumps straight to identifying what changed. The agent gets faster over time.

4. Flaky Detection

Two complementary modes for identifying flaky tests:

retries: N — Run the test N+1 times in separate Playwright processes. Each run gets its own runId with full action capture. Returns a verdict: FLAKY, CONSISTENT PASS, or CONSISTENT FAIL. Best for debugging with 2-3 retries.

e2e_run_test(location: "tests/checkout.spec.ts:15", retries: 2)

repeatEach: N — Native Playwright --repeat-each. All iterations in one process. Fast stress-test for confirming flakiness — use 30-100 for confidence.

e2e_run_test(location: "tests/checkout.spec.ts:15", repeatEach: 40)

5. Evidence Bundles

e2e_get_evidence_bundle packages all failure evidence into a single response — error, steps to reproduce, action timeline, failed network requests with bodies, console errors, DOM snapshot, and screenshots. Replaces calling 6+ tools separately.

Pass outputFile: true to write a markdown file to test-reports/ for Jira attachments.

6. HTML Reports

Batch runs (no location) automatically generate a self-contained HTML report with:

  • Pass/fail summary with status badges
  • Collapsible per-test sections
  • Action timelines, failed network requests, console errors
  • DOM snapshots at failure points
  • Screenshots as inline base64 images

Reports are written to test-reports/report-<runId>.html. You can also call e2e_generate_report manually for any run.

7. Suite Triage & Health Tracking

Run your entire suite, classify every failure, and produce a management-ready report:

/playwright-autopilot:triage-e2e e2e

Claude classifies each failure as Known Issue, App Bug, Test Update, Flaky, or New Failure — cross-references Jira for existing tickets, creates new tickets for app bugs with evidence bundles, and saves the triage run for trend tracking.

e2e_get_stats provides a suite health dashboard — pass rate trends, flaky tests ranked by score, failure category breakdowns, and new failures — all from local history without running tests.

9. Coverage Analysis

e2e_suggest_tests scans your entire project to find coverage gaps:

  1. Untested page object methods — methods in .page.ts / .service.ts files that no spec calls
  2. Missing flow variants — flows with pre-conditions (e.g. "no draft exists") that lack a continuation variant
  3. Uncovered flow steps — actions listed in confirmed flows that no spec exercises

10. Architecture Awareness

Before writing any fix, the plugin scans your project for page objects, service layers, and test fixtures. It follows your existing patterns:

  • Uses your Page Object Model methods instead of writing raw Playwright calls
  • Respects your business/service layer separation
  • Uses getByRole(), getByTestId(), web-first assertions
  • Produces minimal diffs — typically one or two lines added

Debugging Philosophy

The plugin follows a strict diagnostic methodology:

Think in user flows, not selectors. Before touching code, it maps the intended user journey. When a step is missing — a dropdown never selected, a required field never filled — it finds the existing page object method and adds the call.

Four root cause categories:

  1. Missing test step — the test skips a UI interaction the app requires
  2. Test code bug — wrong selector, stale assertion, bad test data
  3. Application bug — the app itself is broken (reported, not worked around)
  4. Dirty state — leftovers from previous test runs interfering

No hacks. The plugin will never use page.evaluate(), page.route(), page.addInitScript(), or any JavaScript injection to work around a failing test. If the fix requires those, it's solving the wrong problem.

Configuration

Multi-project setup

If your Playwright project lives in a different directory than where Claude Code runs, set the PW_PROJECT_DIR environment variable in .mcp.json:

{
  "mcpServers": {
    "playwright-autopilot": {
      "command": "node",
      "args": ["path/to/plugin/server/mcp-server.js"],
      "env": {
        "PW_PROJECT_DIR": "/path/to/your/playwright/project"
      }
    }
  }
}

Requirements

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured