WaterCrawl MCP
Provides AI systems with web crawling, scraping, and search capabilities through WaterCrawl's API, enabling content extraction, site mapping, and web search with customizable options.
README
WaterCrawl MCP
A Model Context Protocol (MCP) server for WaterCrawl, built with FastMCP. This package provides AI systems with web crawling, scraping, and search capabilities through a standardized interface.
Quick Start with npx (No Installation)
Use WaterCrawl MCP directly without installation using npx:
npx @watercrawl/mcp --api-key YOUR_API_KEY
Using with AI Assistants
Codeium/Windsurf
Configure your Codeium or Windsurf with this package without installing it:
{
"mcpServers": {
"watercrawl": {
"command": "npx",
"args": [
"@watercrawl/mcp",
"--api-key",
"YOUR_API_KEY",
"--base-url",
"https://app.watercrawl.dev"
]
}
}
}
Claude Desktop
Run WaterCrawl MCP in SSE mode:
npx @watercrawl/mcp sse --port 3000 --endpoint /sse --api-key YOUR_API_KEY
Then configure Claude Desktop to connect to your SSE server.
Command-line Options
-b, --base-url <url>: WaterCrawl API base URL (default: https://app.watercrawl.dev)-k, --api-key <key>: Required, your WaterCrawl API key-h, --help: Display help information-V, --version: Display version information
SSE mode additional options:
-p, --port <number>: Port for the SSE server (default: 3000)-e, --endpoint <path>: SSE endpoint path (default: /sse)
Development and Contribution
Project Structure
wc-mcp/
├── src/ # Source code
│ ├── cli/ # Command-line interface
│ ├── config/ # Configuration management
│ ├── mcp/ # MCP implementation
│ ├── services/ # WaterCrawl API services
│ └── tools/ # MCP tools implementation
├── tests/ # Test suite
├── dist/ # Compiled JavaScript
├── tsconfig.json # TypeScript configuration
├── package.json # npm package configuration
└── README.md # This file
Setup for Development
- Clone the repository and install dependencies:
git clone https://github.com/watercrawl/watercrawl-mcp
cd watercrawl-mcp
npm install
- Build the project:
npm run build
- Link the package for local development:
npm link @watercrawl/mcp
Contribution Guidelines
- Fork the repository
- Create a feature branch (
git checkout -b feature/your-feature) - Commit your changes (
git commit -m 'Add your feature') - Push to the branch (
git push origin feature/your-feature) - Open a Pull Request
Installation (Alternative to npx)
Global Installation
npm install -g @watercrawl/mcp
Local Installation
npm install @watercrawl/mcp
Configuration
Configure WaterCrawl MCP using environment variables or command-line parameters.
Environment Variables
Create a .env file or set environment variables:
WATERCRAWL_BASE_URL=https://app.watercrawl.dev
WATERCRAWL_API_KEY=YOUR_API_KEY
SSE_PORT=3000 # Optional, for SSE mode
SSE_ENDPOINT=/sse # Optional, for SSE mode
Available Tools
The WaterCrawl MCP server provides the following tools:
1. scrape-url
Scrape content from a URL with customizable options.
{
"url": "https://example.com",
"pageOptions": {
"exclude_tags": ["script", "style"],
"include_tags": ["p", "h1", "h2"],
"wait_time": 1000,
"only_main_content": true,
"include_html": false,
"include_links": true,
"timeout": 15000,
"accept_cookies_selector": ".cookies-accept-button",
"locale": "en-US",
"extra_headers": {
"User-Agent": "Custom User Agent"
},
"actions": [
{"type": "screenshot"},
{"type": "pdf"}
]
},
"sync": true,
"download": true
}
2. search
Search the web using WaterCrawl.
{
"query": "artificial intelligence latest developments",
"searchOptions": {
"language": "en",
"country": "us",
"time_range": "recent",
"search_type": "web",
"depth": "deep"
},
"resultLimit": 5,
"sync": true,
"download": true
}
3. download-sitemap
Download a sitemap from a crawl request in different formats.
{
"crawlRequestId": "uuid-of-crawl-request",
"format": "json" // or "graph" or "markdown"
}
4. manage-crawl
Manage crawl requests: list, get details, stop, or download results.
{
"action": "list", // or "get", "stop", "download"
"crawlRequestId": "uuid-of-crawl-request", // for get, stop, and download actions
"page": 1,
"pageSize": 10
}
5. manage-search
Manage search requests: list, get details, or stop running searches.
{
"action": "list", // or "get", "stop"
"searchRequestId": "uuid-of-search-request", // for get and stop actions
"page": 1,
"pageSize": 10,
"download": true
}
6. monitor-request
Monitor a crawl or search request in real-time, with timeout control.
{
"type": "crawl", // or "search"
"requestId": "uuid-of-request",
"timeout": 30, // in seconds
"download": true
}
License
ISC
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.