Selenium MCP Server
Exposes Selenium WebDriver as an MCP server, enabling AI agents and LLMs to control real browsers for automation tasks like navigation, element interaction, and screenshot capture.
README
Selenium MCP Server
Model Context Protocol (MCP) server for Selenium WebDriver that enables AI agents and LLMs to control real browsers for automation
This project exposes Selenium WebDriver as an MCP (Model Context Protocol) server, allowing AI agents to control a real browser through structured tools.
It enables LLMs and autonomous agents to perform tasks like:
- Opening browsers
- Navigating websites
- Discovering UI elements
- Clicking buttons and links
- Typing into inputs
- Extracting page text
- Taking screenshots
- Many more future upgrades (in-progress)
This makes it possible to build AI-powered browser automation systems and autonomous QA agents.
Table of Contents
- Why This Project Exists
- Architecture
- Features
- Installation
- Running the Server
- MCP Server Version
- Available MCP Tools
- Browser Session Flow
- Example Agent Workflow
- System Prompt for AI Agents
- Prompt Customization
- Logging
- Configure Your MCP Client
- Requirements
- Use Cases
- Contributing
- License
- Author
WHY THIS PROJECT EXISTS
Modern AI agents need a way to interact with real applications.
While traditional automation tools like Selenium exist, they are not directly usable by LLM agents.
This project bridges that gap by exposing Selenium functionality through MCP tools so that agents can:
- Understand web pages
- Discover UI elements
- Perform actions
- Validate results
ARCHITECTURE
flowchart TD
A[LLM Agent] --> B[MCP Protocol]
B --> C[Selenium MCP Server]
C --> D[Browser Tools]
C --> E[Navigation Tools]
C --> F[Interaction Tools]
C --> G[Element Tools]
C --> H[Debug Tools]
D --> I[Selenium WebDriver]
E --> I
F --> I
G --> I
H --> I
I --> J[Browser]
FEATURES
- MCP-compatible Selenium automation server
- Browser session management
- Navigation controls
- UI element discovery
- Accessibility-aware interaction
- Screenshot capture
- Page text extraction
- Headless browser support
- Multi-tab browser management (open, switch, close, track active tab)
- Improved interactive element detection for modern UI frameworks (React, Angular, dynamic DOM)
INSTALLATION
Run the following command
pip install selenium-mcp
RUNNING THE SERVER
Start the MCP server
You can start the Selenium MCP server using different transport modes depending on your use case.
Default (STDIO)
selenium-mcp run
- Uses stdio transport
- Best for local agent integrations
- No network exposure
HTTP Mode (Recommended)
selenium-mcp run --transport http --host 127.0.0.1 --port 3345
Starts server at: http://127.0.0.1:3345
MCP endpoint: http://127.0.0.1:3345/mcp
Best for:
- API integrations
- Postman / curl testing
- production-style usage
SSE Mode (Streaming)
selenium-mcp run --transport sse --host 127.0.0.1 --port 3345
Starts server at: http://127.0.0.1:3345/sse
Best for:
- streaming-based agents
- real-time interactions
Note: Note: SSE endpoints are streaming and may not show output directly in the browser.
Expose Server on Network:
selenium-mcp run --transport http --host 0.0.0.0 --port 3345
Makes server accessible from:
- other devices on the same network
- Docker / VM environments
Notes:
Default port: 3336
Supported transports:
stdio (default)
http
sse
Ensure port is within range: 1–65535
MCP SERVER VERSION
To check the current version of the selenium MCP server, run the following command:
selenium-mcp version
AVAILABLE MCP TOOLS
Run the following command to get the list of tools supported by MCP server:
selenium-mcp tools
This returns the list of tools supported by MCP server.
BROWSER CONTROL
open_browser– Launch a new browser sessionclose_browser– Close the browser sessionmaximize_browser– Maximize browser windowfullscreen_browser– Switch browser to fullscreen
NAVIGATION
open_url– Navigate to a specific URLnavigate_back– Navigate back in browser historynavigate_forward– Navigate forward in historyrefresh_page– Reload the pagewait_for_page– Wait for page to loadget_page_title– Get the current page title
TAB MANAGEMENT
get_tabs– Retrieve all open tabs in the current sessionswitch_tab– Switch to a specific tab using indexopen_new_tab– Open a new tab and optionally navigate to a URLclose_tab– Close a specific tab by indexget_current_tab– Retrieve the currently active tabname_tab– Assign a custom name to a tab for easier identification
These tools allow agents to manage multiple tabs within a single browser session.
ELEMENT DISCOVERY
get_interactive_elements– Discover visible interactive elements on the pageget_accessibility_tree– Retrieve simplified accessibility tree for the page
These tools allow agents to understand the UI structure before interacting with it.
Notes
- Element detection is optimized for modern web applications (React, Angular, dynamic UI frameworks).
- Elements are identified using interaction signals such as roles, click handlers, and focusability.
- Only visible and meaningful elements are returned to reduce noise.
INTERACTION TOOLS
click_element– Click an element by indextype_into_element– Enter text into an input field
Elements must first be discovered using: get_interactive_elements
PAGE ANALYSIS
get_page_text – Extract visible text from the page
Useful for:
- validation
- reasoning
- information extraction
VISUAL DEBUGGING
take_screenshot – Capture a screenshot of the current browser window
Screenshot Storage Location
When screenshots are captured, they are automatically saved in a hidden folder inside your home directory.
macOS / Linux
Screenshots are stored at:
~/.selenium-mcp/screenshot
Example full path:
/Users/<your-username>/.selenium-mcp/screenshot
You can open the folder using Terminal:
open ~/.selenium-mcp/screenshot
Windows
Screenshots are stored at:
C:\Users\<your-username>\.selenium-mcp\screenshot
Example:
C:\Users\John\.selenium-mcp\screenshot
You can open it from File Explorer by entering the following in the address bar:
%USERPROFILE%\.selenium-mcp\screenshot
Custom Screenshot Directory (Optional)
You can override the default screenshot location using the environment variable: SELENIUM_MCP_SCREENSHOT_DIR
macOS / Linux
export SELENIUM_MCP_SCREENSHOT_DIR=~/my-screenshots
Windows (PowerShell)
$env:SELENIUM_MCP_SCREENSHOT_DIR="C:\my-screenshots"
All screenshots will then be saved to the specified directory.
Notes
- The folder is created automatically the first time a screenshot is taken.
- The
.selenium-mcpdirectory is hidden by default because it starts with a dot (.). - You can safely delete screenshots anytime.
BROWSER SESSION FLOW
Each browser session is identified by a session_id.
Typical workflow for agents:
- open_browser
- open_url
- wait_for_page
- get_interactive_elements
- (optional) get_tabs / switch_tab if multiple tabs are present
- click_element or type_into_element
MULTI-TAB WORKFLOW
Agents can work with multiple tabs within the same browser session.
Example workflow:
- open_browser
- open_url
- open_new_tab("https://example.com")
- get_tabs
- switch_tab(index)
- perform actions
- close_tab(index)
Notes
- Each tab is tracked using an internal index.
- The active tab is automatically managed and updated.
- All actions are performed on the currently active tab.
EXAMPLE AGENT WORKFLOW
Example task:
- Open Chrome browser.
- Navigate to Google.com
- Type the text "Selenium MCP" in the search box.
- Press the search button
Agent steps:
open_browser
open_url("https://google.com")
wait_for_page
get_interactive_elements
type_into_element(index, "Selenium MCP")
click_element(index)
wait_for_page
get_page_text
SYSTEM PROMPT FOR AI AGENTS
This repository includes a production-grade system prompt designed specifically for browser automation agents that interact with this Selenium MCP server.
The prompt contains detailed operational guidelines that instruct the AI agent on how to:
- initialize and control the browser
- discover and interact with UI elements
- analyze page structure using the accessibility tree
- avoid hallucinating element indexes
- handle navigation and page reloads
- recover from stale elements
- follow a deterministic execution loop (PLAN → ACT → OBSERVE → UPDATE PLAN)
- enforce safety limits on tool usage
Prompt location
prompts/system_prompt.md
How to use
Whenever you build an AI agent that interacts with this MCP server, this prompt should be provided as the system prompt for the model.
Why this prompt
Browser automation agents can easily make incorrect decisions if not guided properly. This system prompt provides strict operational rules and guardrails that help the agent:
- use MCP tools correctly
- avoid incorrect element interactions
- minimize hallucinations
- perform reliable browser automation tasks
Using this prompt significantly improves the stability, accuracy, and reliability of AI-driven browser automation.
Recommendation
It is strongly recommended that all AI agents interacting with this Selenium MCP server use this system prompt to ensure consistent and reliable behavior.
PROMPT CUSTOMIZATION
You may modify or extend the system prompt depending on your use case. However, it is recommended to preserve the core operational rules related to:
- MCP tool usage
- element discovery
- navigation handling
- safety limits
LOGGING
All application logs are stored in a user-specific directory:
~/.selenium-mcp/logs/
This directory is automatically created when the server starts.
Log file
Logs are written to:
~/.selenium-mcp/logs/selenium_mcp.log
Features:
- Daily log file rotation
- Automatic cleanup of older log files
- Logs written to both console and file
- Persistent logs independent of the project directory
Logs are stored in the user's home directory so they remain available even if the package is installed globally via pip. This makes it easier to debug issues and monitor MCP server activity across different projects.
Example Log Entry
2026-03-15 19:00:07,444 [INFO] [selenium-mcp] Initializing Selenium MCP Server...
macOS / Linux
Logs are stored in:
/Users/<username>/.selenium-mcp/logs/
Example:
/Users/john/.selenium-mcp/logs/selenium_mcp.log
You can open it from the terminal:
cd ~/.selenium-mcp/logs
ls
View logs:
cat selenium_mcp.log
or
tail -f selenium_mcp.log
Windows
Logs are stored in:
C:\Users\<username>\.selenium-mcp\logs\
Example:
C:\Users\John\.selenium-mcp\logs\selenium_mcp.log
Open it in File Explorer:
C:\Users\%USERNAME%\.selenium-mcp\logs\
Or from Command Prompt:
cd %USERPROFILE%\.selenium-mcp\logs
dir
CONFIGURE YOUR MCP CLIENT
Add the Selenium MCP server to your MCP client configuration.
Example STDIO mode:
{
"mcpServers": {
"selenium-mcp": {
"command": "selenium-mcp"
}
}
}
This tells the MCP client how to start the Selenium MCP server using stdio mode.
Example HTTP mode:
{
"mcpServers": {
"selenium-mcp": {
"command": "selenium-mcp",
"args": ["run", "--transport", "http", "host", "127.0.0.1", "--port", "3345"]
}
}
}
- Runs MCP server over HTTP
- Endpoint: http://127.0.0.1:3345/mcp
Example SSE mode:
{
"mcpServers": {
"selenium-mcp": {
"command": "selenium-mcp",
"args": ["run", "--transport", "sse", "host", "127.0.0.1", "--port", "3345"]
}
}
}
- Runs MCP server with streaming (SSE) transport
- Useful for real-time agent interactions
- Endpoint: http://127.0.0.1:3345/sse
Client Examples
Claude Desktop
Config file location:
macOS
~/Library/Application Support/Claude/claude_desktop_config.json
Windows
%APPDATA%\Claude\claude_desktop_config.json
STDIO – Works for Claude Desktop
Add
{
"mcpServers": {
"selenium-mcp": {
"command": "selenium-mcp"
}
}
}
Restart Claude Desktop after updating the configuration.
- Uses stdio transport
- Works out of the box with Claude Desktop
- No additional configuration required
Troubleshooting
If you encounter issues while setting up or running Selenium MCP, try the following solutions.
selenium-mcp: command not found
This usually means the CLI command is not available in your system PATH.
First verify the package is installed:
pip show selenium-mcp
Locate the installed command.
macOS / Linux
which selenium-mcp
Example output:
/Users/<username>/.local/bin/selenium-mcp
If the command is found, update your MCP client configuration to use the full path:
{
"mcpServers": {
"selenium-mcp": {
"command": "/Users/<username>/.local/bin/selenium-mcp"
}
}
}
Windows
Run:
where selenium-mcp
Example output:
C:\Users\<username>\AppData\Roaming\Python\Python311\Scripts\selenium-mcp.exe
Update your MCP client configuration:
{
"mcpServers": {
"selenium-mcp": {
"command": "C:\\Users\\<username>\\AppData\\Roaming\\Python\\Python311\\Scripts\\selenium-mcp.exe"
}
}
}
Note: Windows paths in JSON require double backslashes (\\).
REQUIREMENTS
- Python 3.10+
- Web browser
USE CASES
This project can be used to build:
- AI test automation agents
- Autonomous QA assistants
- LLM-powered browser copilots
- Self-healing test frameworks
- AI web scraping agents
- Intelligent UI testing systems
CONTRIBUTING
Contributions are welcome.
Steps:
- Fork the repository
- Create a feature branch
- Submit a pull request
LICENSE
MIT License
AUTHOR
Prashant Nayak
🔗 LinkedIn: https://www.linkedin.com/in/prashantjnayak
Built to help the QA and AI automation community build intelligent browser automation systems.
SUPPORT THE PROJECT
If this project helps you:
- Star the repository
- Share it with the QA community
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.