Browserose MCP
MCP server for AI agents to control Google Chrome via Playwright with full iframe support, enabling end-to-end browser automation including navigation, interaction within iframes, and diagnostic tools.
README
Browserose MCP
MCP Tool for Agents to control the Google Chrome browser.
Browserose MCP is an MCP (Model Context Protocol) server that lets AI agents and IDEs control Google Chrome via Playwright, with full iframe support: snapshot and interact inside iframes (e.g. ALM SCORM players, embedded apps).
The toolset is sufficient for full end-to-end automation. You can complete entire flows—login, multi-step navigation, nested iframes, quizzes, and course completion—without manual steps or switching tabs. Use navigation, locators, frame selectors, coordinate fallbacks, and diagnostics as needed.
Requirements
- Node.js 18+
- Chrome installed (or set
PLAYWRIGHT_MCP_USE_CHROMIUM=1to use Chromium) - Playwright browsers:
npx playwright install chromium(orchromeif available)
Install and build
npm install
npx playwright install chromium
npm run build
Cursor / IDE configuration
Add to ~/.cursor/mcp.json (or project .cursor/mcp.json):
{
"mcpServers": {
"playwright-chrome": {
"command": "node",
"args": ["/absolute/path/to/Browserose-MCP/build/index.js"]
}
}
}
Restart Cursor after editing mcp.json.
Environment variables
PLAYWRIGHT_MCP_HEADLESS— set to1ortrueto run the browser headless (default: headed so you see the window).PLAYWRIGHT_MCP_USE_CHROMIUM— set to1ortrueto use Playwright's Chromium instead of system Chrome.PLAYWRIGHT_MCP_VIEWPORT_MAXIMIZED— window opens maximized by default. Set to0orfalseto use a fixed size (see width/height below).PLAYWRIGHT_MCP_VIEWPORT_WIDTH— viewport width in pixels (default:1280). Used only when maximized is off.PLAYWRIGHT_MCP_VIEWPORT_HEIGHT— viewport height in pixels (default:800). Used only when maximized is off.
Examples: To use a fixed size instead of maximized, set PLAYWRIGHT_MCP_VIEWPORT_MAXIMIZED=0 and optionally PLAYWRIGHT_MCP_VIEWPORT_WIDTH=1920, PLAYWRIGHT_MCP_VIEWPORT_HEIGHT=1080.
Popup handler
When a link or button opens a new tab (e.g. Go to activity on IBM SkillsBuild opens the SCORM player in a new window), the MCP automatically attaches to that new tab. The browser context listens for the page event; when a new page is created, the internal page reference is updated so that all subsequent tool calls (navigate, click, snapshot, frame_probe, etc.) run in the new tab, not the original one. You do not need to switch tabs manually or use a separate “switch to tab” tool: click Go to activity, wait for the player to load, then continue with the same tools—they will already target the player tab. Implementation: in src/browser.ts, context.on("page", (newPage) => { page = newPage; }) runs on every new page (popup or target=_blank). Restart the MCP server (or Cursor) after pulling changes so the handler is active.
Tools (quick reference)
| Tool | Purpose |
|---|---|
browser_navigate |
Go to URL |
browser_go_back / browser_go_forward |
History |
browser_snapshot |
Accessibility-like tree of the page; use includeFrames: true to include same-origin iframes |
browser_snapshot_frame |
Snapshot a single iframe by selector (e.g. iframe, iframe#pplayer_iframe) |
browser_click |
Click by ref; optional frameSelector for elements inside an iframe |
browser_type |
Type text; optional ref, frameSelector, submit |
browser_type_locator |
Type into element by role/text/css; optional frameSelector (omit = main page) |
browser_hover |
Hover by ref; optional frameSelector |
browser_select_option |
Select option(s) by ref; optional frameSelector |
browser_press_key |
Press a key; optional frameSelector |
browser_screenshot |
Page or iframe screenshot; optional frameSelector |
browser_click_at |
Click at (x, y) relative to a frame's viewport (for canvas/cross-origin when refs fail); requires frameSelector, x, y |
browser_click_locator |
Click by locator: role+name, text, or css. Optional: nth (0-based index), enabledOnly (first enabled match), scopeCss (within container). No snapshot needed. |
browser_list_clickables |
List clickables in a frame. Requires frameSelector when targeting an iframe (e.g. content frame). Optional filters: role, name, text, css; enabledOnly; scopeCss (within container). Use to discover what to click. |
browser_frame_probe |
Diagnostic: run inside frame to get url, title, readyState, button/clickable counts, textSample. If probe fails or counts=0, UI may be canvas. |
browser_frame_bbox |
Get frame bounding box (x, y, width, height) in page coordinates. |
browser_click_at_rel |
Click at relative (rx, ry) in 0..1 inside the frame (e.g. 0.5, 0.9 = center-bottom). Uses Playwright page.mouse.click. |
browser_frame_inventory |
Inside frame: list child iframes (id, name, src, rect), canvas (rect), shadowHosts count, bodyRect. Use to see if UI is in nested iframe or canvas. |
browser_hit_test_rel |
elementFromPoint at (rx, ry) in frame; returns tag, id, class, rect, pointerEvents, cursor (and iframe src/name). Confirms where clicks land. |
browser_click_at_rel_debug |
Frame screenshot + text with page coords and in-frame pixel where click_at_rel(rx, ry) would click. |
browser_evaluate_click |
Click element in frame via DOM .click() (frameSelector + css; optional nth). Bypasses visibility/actionability. Use when quiz SUBMIT or other button fails. |
browser_evaluate_click_by_text |
Click by text content in frame (frameSelector + text; optional match exact/contains, scopeCss, nth). Bypasses visibility/overlays. Use when quiz options or labels don’t respond to locator/click_at_rel. |
browser_wait |
Sleep (seconds) |
Tools reference and use cases
Detailed description of each tool and when to use it. Optional frameSelector uses >> for nested iframes (e.g. iframe#a >> iframe#b).
Navigation
| Tool | What it does | Use cases |
|---|---|---|
browser_navigate |
Opens a URL in the current tab. | Starting a flow, opening login pages, course URLs, or any target site. |
browser_go_back |
Goes back in history. | Undoing a navigation or returning from a redirect. |
browser_go_forward |
Goes forward in history. | Repeating a step after going back. |
Snapshot and ref-based interaction
These tools use an accessibility/DOM snapshot to get refs (e.g. s1e2). You then pass that ref to click, type, hover, or select. Best when the page (or iframe) is same-origin and has a normal DOM.
| Tool | What it does | Use cases |
|---|---|---|
browser_snapshot |
Captures an accessibility-like tree of the current page. Use includeFrames: true to include same-origin iframes. |
Discovering structure and getting refs for main page and embedded frames in one call. |
browser_snapshot_frame |
Snapshots a single iframe by selector. For chained selectors (>>), uses CDP when in-frame evaluate isn't possible. |
Inspecting one frame's tree; getting refs for elements inside that frame (same-origin or when CDP can provide refs). |
browser_click |
Clicks the element identified by ref. Optional frameSelector when the element is inside an iframe. |
Buttons, links, checkboxes—any clickable from the snapshot. |
browser_type |
Types text into the focused element or the element identified by ref. Optional frameSelector, submit (press Enter). |
Text inputs, search boxes, login fields. |
browser_hover |
Hovers over the element identified by ref. Optional frameSelector. |
Opening dropdowns or tooltips before clicking. |
browser_select_option |
Selects option(s) in a dropdown by ref. Optional frameSelector. |
Select elements, language pickers, filters. |
browser_press_key |
Sends a key (e.g. Enter, Tab, ArrowRight). Optional frameSelector to target a frame. |
Submitting forms, keyboard navigation, escaping modals. |
Locator-based interaction (no snapshot)
These tools use Playwright locators (role+name, text, or css) and do not require a snapshot. They work in cross-origin iframes and are the main escape hatch when refs aren't available or snapshot is empty.
| Tool | What it does | Use cases |
|---|---|---|
browser_click_locator |
Clicks an element by locator: role+name, text, or css. Optional: nth (0-based index of match), enabledOnly (click first enabled match when several are disabled), scopeCss (resolve only within this container selector). |
Cross-origin iframes, login buttons, SCORM "Next", quiz "SUBMIT" when multiple SUBMITs exist (use enabledOnly: true or nth), text inside a card (use scopeCss: ".quiz-card" to avoid sidebar). |
browser_type_locator |
Types into an element found by role/text/css. Optional: nth (0-based index), scopeCss (within container). |
Login fields, search boxes, any input when snapshot isn't used; when several inputs match, use nth or scopeCss. |
browser_list_clickables |
Lists visible buttons/links in a frame. Optional: role, name, text, css (filter list to matches), enabledOnly (list only enabled), scopeCss (list only within container). |
Discovering what's clickable; listing only "SUBMIT" buttons (role: "button", name: "SUBMIT") to see which is enabled; listing only clickables inside a quiz area (scopeCss). |
Customizing by situation: Use nth when multiple elements match (e.g. 4th button named "SUBMIT"). Use enabledOnly to click the first enabled match when the DOM has several disabled copies. Use scopeCss to restrict the search to a container (e.g. .quiz-card, [role="dialog"]) so you don't match the same text or role in the sidebar or another panel.
Coordinate-based interaction (escape hatch)
When snapshot and locators both fail (e.g. canvas, custom-rendered UI, or wrong frame depth), use coordinates relative to a frame.
| Tool | What it does | Use cases |
|---|---|---|
browser_frame_bbox |
Returns the frame's bounding box (x, y, width, height) in page coordinates. | Converting relative positions to absolute (x, y) for browser_click_at, or understanding frame position. |
browser_click_at |
Clicks at pixel (x, y) relative to the frame's viewport. Requires frameSelector, x, y. |
Canvas or non-DOM UI when you know exact coordinates (e.g. from a screenshot). |
browser_click_at_rel |
Clicks at relative position (rx, ry) in [0..1] inside the frame (e.g. 0.5, 0.9 = center-bottom). Uses Playwright's mouse. |
Clicking "bottom-right" or "center" of a frame when you don't have pixel coords; quick fallback for known layout. |
browser_click_at_rel_debug |
Returns a screenshot of the frame and the exact page coordinates and in-frame pixel where click_at_rel(rx, ry) would click. |
Debugging: confirm that (rx, ry) lands on the right element before using browser_click_at_rel. |
Diagnostics (finding the right frame / layer)
When a frame shows no buttons or empty text in the snapshot, the real UI is often in a child iframe, canvas, or shadow DOM. These tools help you find it.
| Tool | What it does | Use cases |
|---|---|---|
browser_frame_probe |
Runs a small script inside the frame: returns url, title, readyState, counts of buttons/clickables, and a short textSample. |
Quick check: "Does this frame have any DOM?" If buttons: 0, clickables: 0, textSample: "", the visible UI is likely in a child iframe or canvas. |
browser_frame_inventory |
Lists child iframes (id, name, src, rect), canvas elements (rect), count of shadow roots, and bodyRect. |
When probe says "no content": find the real content frame (e.g. ALM SCORM's iframe#content-frame) or confirm the UI is canvas. Then extend the frame selector chain and use locators in that frame. |
browser_hit_test_rel |
Uses elementFromPoint(rx*width, ry*height) inside the frame; returns tag, id, class, rect, pointerEvents, cursor; for iframes, src/name. |
Verify what element a relative point (rx, ry) hits—e.g. "Is (0.5, 0.92) really the Next button or an overlay?" |
Utility
| Tool | What it does | Use cases |
|---|---|---|
browser_screenshot |
Takes a screenshot of the full page or a specific iframe (frameSelector). |
Visual verification, debugging layout, or feeding into vision models. |
browser_wait |
Pauses for a given number of seconds. | Letting the page or iframe finish loading before snapshot or click. |
Use cases in practice
-
Normal web automation (main page)
Usebrowser_snapshot(orbrowser_snapshot_framewith no/minimal nesting) to get refs, thenbrowser_click,browser_type,browser_hover,browser_select_optionwith those refs. Optionalbrowser_screenshotfor verification. -
Login flows
Often on the main page:browser_type_locatorandbrowser_click_locatorwithrole/nameortext(e.g. email → Continue → password → Log in). No snapshot required. -
Single iframe, same-origin
browser_snapshot_framewithframeSelector: "iframe#id"→ get refs →browser_click/browser_typewith the sameframeSelector. -
Cross-origin or "empty" iframe
Snapshot may be empty or refs may not work. Use locators:browser_list_clickableswithframeSelectorto see what's there, thenbrowser_click_locatorandbrowser_type_locatorwith the sameframeSelectorand role/text/css. -
ALM / SCORM (nested iframes)
The visible lesson UI is often in a third-level iframe. Ifbrowser_frame_probeoniframe#pplayer_iframe >> iframe#modulePlayerIframeshowsclickables: 0, runbrowser_frame_inventoryon that chain; it will list child iframes (e.g.iframe#content-frame). Extend the chain to... >> iframe#content-frameand usebrowser_list_clickablesandbrowser_click_locator(e.g.role: "button",name: "Next") there. One-line takeaway: when the frame has no DOM content, use frame_inventory to find the real content iframe, then add it to the chain. -
Canvas or custom-rendered UI
If frame_inventory shows a large canvas and no useful iframe, or locators don't match: usebrowser_click_at_rel_debugto see where (rx, ry) lands, thenbrowser_click_at_relwith adjusted (rx, ry), orbrowser_frame_bbox+browser_click_atwith computed (x, y). -
Quizzes / multiple identical buttons
When several "SUBMIT" or "Next" buttons exist and only one is enabled, usebrowser_click_locatorwithenabledOnly: true(and samerole/name) so the first enabled match is clicked. Or usebrowser_list_clickableswithrole: "button", name: "SUBMIT"(and optionallyenabledOnly: true) to see indices, then click withnth. UsescopeCss(e.g..quiz-card) to restrict to the current question card and avoid matching the sidebar. -
Debugging "click does nothing"
Check: (1) Correct frame? →browser_frame_probeandbrowser_frame_inventory. (2) Right element? →browser_list_clickablesin that frame;browser_hit_test_relto see what's under (rx, ry). (3) Right coordinates? →browser_click_at_rel_debug. (4) Multiple matches? → usenthorenabledOnly.
End-to-end: IBM SkillsBuild / ALM course
You can run the full course flow with only Browserose MCP tools: from the IBM page through login, learning plan, launching the activity, and completing lessons and quizzes.
1. Start from the IBM page and log in
- Navigate to the course or plan URL (e.g.
https://skills.yourlearning.ibm.com/activity/PLAN-...). - If redirected to login:
browser_click_locatorwithtext: "Log in with ibm"(or equivalent). On the IBM login page:browser_type_locatorwithrole: "textbox",name: "IBMid", and the email.browser_click_locatorwithrole: "button",name: "Continue".browser_type_locatorwithrole: "textbox",name: "Password", and the password.browser_click_locatorwithrole: "button",name: "Log in".
- Use
browser_waitandbrowser_screenshotas needed to confirm the next page.
2. Open the learning plan and the module
- From the plan page:
browser_click_locatorwithtext: "Microcredential 1: Data Classification"(or the right section). - Then
browser_click_locatorwithtext: "Classifying and Sourcing Data"(or the target module). - Launch the activity:
browser_click_locatorwithrole: "button",name: "Go to activity". - Wait for the player: Clicking Go to activity may open the SCORM player in a new tab. The MCP attaches to new tabs automatically (popup handler), so the next tool calls then run in the player tab. Use
browser_wait(e.g. 5–8 seconds), thenbrowser_frame_probeorbrowser_frame_inventoryoniframe#pplayer_iframe(and theniframe#pplayer_iframe >> iframe#modulePlayerIframeif needed) until the content frame is present. The visible lesson UI is in the content frame:
iframe#pplayer_iframe >> iframe#modulePlayerIframe >> iframe#content-frame
3. Content frame: lessons and “Continue”
- Continue / Next (content):
browser_click_locatorwithframeSelector: "iframe#pplayer_iframe >> iframe#modulePlayerIframe >> iframe#content-frame",css: "button.continue-btn". - Use
browser_list_clickableswith thatframeSelectorto discover buttons (e.g. “Next”, “Continue”). Usebrowser_screenshotwith thatframeSelectorwhen you need to see what’s on screen.
4. Practice quiz: two-pass strategy
The fastest way to pass is: (1) first pass—answer arbitrarily, submit, read the correct answer from feedback, memorize it; (2) TAKE AGAIN; (3) second pass—answer with the memorized correct options and submit.
- Start or restart quiz:
browser_click_locatorin the content frame withtext: "START QUIZ"ortext: "TAKE AGAIN". - Select an option:
If locators like#qmc-X-labelare covered by overlays, usebrowser_click_at_relin the content frame with e.g.rx: 0.5, andryroughly: first option ~0.48–0.52, second ~0.56–0.6, third ~0.64–0.68 (tune if layout differs). Or usebrowser_hit_test_rel/browser_click_at_rel_debugto confirm. - Submit:
browser_click_locatorin the content frame withrole: "button",name: "SUBMIT",enabledOnly: true(so the active SUBMIT is clicked). - Read feedback:
After submit, usebrowser_frame_probeon the content frame and readtextSample(or usebrowser_screenshot) to get “Correct answer: …” and store it per question index (Q1, Q2, …). - Next question:
browser_click_at_relin the content frame withrx: 0.65,ry: 0.85(NEXT button area). Repeat until the quiz is done (no more SUBMIT or you see completion). - Second pass:
Click TAKE AGAIN, then for each question click the option that matches the stored correct answer (byryband or by locator if the correct option text is clickable), SUBMIT, then click (0.65, 0.85) for NEXT until the quiz is complete.
5. After the quiz and finishing the module
- When the quiz is complete, use
browser_click_locatorin the content frame withcss: "button.continue-btn"(or equivalent) to continue to the next lesson or close the module. - Repeat the same pattern for further lessons and quizzes until the module/course is marked complete.
Summary: All steps—login, plan navigation, “Go to activity”, waiting for the player, lesson Continue, quiz (two-pass with feedback reading and TAKE AGAIN), and completion—can be done with the existing tools (navigate, click_locator, type_locator, list_clickables, frame_probe, frame_inventory, click_at_rel, screenshot, wait). No manual tab switching or external tools are required.
Using iframes (including cross-origin / ALM SCORM)
- Navigate to a page that contains an iframe (e.g. ALM course page).
- Call
browser_snapshotwithincludeFrames: trueto get the main page plus same-origin iframes, or callbrowser_snapshot_framewithframeSelector: "iframe"(oriframe#id) to get only that frame's tree. - Use the returned refs with
browser_click,browser_type, etc., and pass the sameframeSelector. For nested iframes use a chained selector with>>, e.g.iframe#pplayer_iframe >> iframe#modulePlayerIframe.
Example: click "Next" inside the first iframe:
browser_snapshot_framewithframeSelector: "iframe"→ get ref for the "Next" button (e.g.f1e2).browser_clickwithref: "f1e2",frameSelector: "iframe".
Escape hatch (cross-origin / SCORM): For frames where snapshot fails, use Playwright locators directly (no AX/DOMSnapshot):
browser_list_clickableswithframeSelector: "iframe#pplayer_iframe >> iframe#modulePlayerIframe"→ lists buttons/links with text and enabled/disabled.- ALM SCORM: The visible lesson UI (e.g. "Next", "Learning objectives") lives in a third-level iframe. Use
browser_frame_inventoryoniframe#pplayer_iframe >> iframe#modulePlayerIframeto see child iframes; then chain to the content frame:iframe#pplayer_iframe >> iframe#modulePlayerIframe >> iframe#content-frame. Use that selector withbrowser_list_clickablesandbrowser_click_locator(e.g.role: "button",name: "Next"). browser_click_locatorwith the sameframeSelectorandrole: "button",name: "Next"(ortext: "Next") → clicks the element. Works because Playwright targets the frame's context directly.
Cross-origin / SCORM (AX tree empty): The server also uses a 3-tier snapshot for frames:
- Tier A — CDP
Accessibility.getFullAXTree(refs withbackendDOMNodeId; click via box model). - Tier B — If AX is empty, CDP
DOMSnapshot.captureSnapshot(refs with viewport coordinates; click viaInput.dispatchMouseEvent). - Tier C — Use
browser_screenshotwithframeSelector, thenbrowser_click_atwith the sameframeSelectorand(x, y)to click by coordinates (e.g. canvas or when both AX and DOM snapshot fail).
License and author
- License: This project is open source. Use and modify it freely.
- Developer: ETTALBI OMAR
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.