Web-curl MCP Server
A powerful tool for fetching and extracting text content from web pages and APIs, supporting web scraping, REST API requests, and Google Custom Search integration.
README
Google Custom Search API
Google Custom Search API is free with usage limits (e.g., 100 queries per day for free, with additional queries requiring payment). For full details on quotas, pricing, and restrictions, see the official documentation.
Web-curl

Developed by Rayss
🚀 Open Source Project
🛠️ Built with Node.js & TypeScript (Node.js v18+ required)
🎬 Demo Video
<video src="demo/demo.mp4" controls width="600"></video>
<details> <summary>Click to watch the demo directly in your browser</summary>
</details>
📚 Table of Contents
- Overview
- Features
- Architecture
- Installation
- Usage
- Configuration
- Examples
- Troubleshooting
- Tips & Best Practices
- Contributing & Issues
- License & Attribution
<a name="overview"></a>
📝 Overview
Web-curl is a powerful tool for fetching and extracting text content from web pages and APIs. Use it as a standalone CLI or as an MCP (Model Context Protocol) server. Web-curl leverages Puppeteer for robust web scraping and supports advanced features such as resource blocking, custom headers, authentication, and Google Custom Search.
<a name="features"></a>
✨ Features
- 🔎 Retrieve text content from any website.
- 🚫 Block unnecessary resources (images, stylesheets, fonts) for faster loading.
- ⏱️ Set navigation timeouts and content extraction limits.
- 💾 Output results to stdout or save to a file.
- 🖥️ Use as a CLI tool or as an MCP server.
- 🌐 Make REST API requests with custom methods, headers, and bodies.
- 🔍 Integrate Google Custom Search (requires API key and CX).
- 🤖 Smart command parsing (auto-detects URLs and search queries).
- 🛡️ Detailed error logging and robust error handling.
<a name="architecture"></a>
🏗️ Architecture
- CLI & MCP Server:
src/index.ts
Implements both the CLI entry point and the MCP server, exposing tools likefetch_webpage,fetch_api,google_search, andsmart_command. - Web Scraping: Uses Puppeteer for headless browsing, resource blocking, and content extraction.
- REST Client:
src/rest-client.ts
Provides a flexible HTTP client for API requests, used by both CLI and MCP tools. - Configuration: Managed via CLI options, environment variables, and tool arguments.
<a name="installation"></a>
⚙️ MCP Server Configuration Example
To integrate web-curl as an MCP server, add the following configuration to your mcp_settings.json:
{
"mcpServers": {
"web-curl": {
"command": "node",
"args": [
"build/index.js"
],
"disabled": false,
"alwaysAllow": [
"fetch_webpage",
"fetch_api",
"google_search",
"smart_command"
],
"env": {
"APIKEY_GOOGLE_SEARCH": "YOUR_GOOGLE_API_KEY",
"CX_GOOGLE_SEARCH": "YOUR_CX_ID"
}
}
}
}
🔑 How to Obtain Google API Key and CX
-
Get a Google API Key:
- Go to Google Cloud Console.
- Create/select a project, then go to APIs & Services > Credentials.
- Click Create Credentials > API key and copy it.
-
Get a Custom Search Engine (CX) ID:
- Go to Google Custom Search Engine.
- Create/select a search engine, then copy the Search engine ID (CX).
-
Enable Custom Search API:
- In Google Cloud Console, go to APIs & Services > Library.
- Search for Custom Search API and enable it.
Replace YOUR_GOOGLE_API_KEY and YOUR_CX_ID in the config above.
<a name="installation"></a>
🛠️ Installation
# Clone the repository
git clone <repository-url>
cd web-curl
# Install dependencies
npm install
# Build the project
npm run build
### Puppeteer installation notes
- **Windows:** Just run `npm install`.
- **Linux:** You must install extra dependencies for Chromium. Run:
```bash
sudo apt-get install -y \
ca-certificates fonts-liberation libappindicator3-1 libasound2 libatk-bridge2.0-0 \
libatk1.0-0 libcups2 libdbus-1-3 libdrm2 libgbm1 libnspr4 libnss3 \
libx11-xcb1 libxcomposite1 libxdamage1 libxrandr2 xdg-utils
For more details, see the Puppeteer troubleshooting guide.
---
<a name="usage"></a>
## 🚀 Usage
### CLI Usage
The CLI supports fetching and extracting text content from web pages.
```bash
# Basic usage
node build/index.js https://example.com
# With options
node build/index.js --timeout 30000 --no-block-resources https://example.com
# Save output to a file
node build/index.js -o result.json https://example.com
Command Line Options
--timeout <ms>: Set navigation timeout (default: 60000)--no-block-resources: Disable blocking of images, stylesheets, and fonts-o <file>: Output result to specified file
MCP Server Usage
Web-curl can be run as an MCP server for integration with Roo Code or other MCP-compatible platforms.
Exposed Tools
- fetch_webpage: Retrieve text content from a web page
- fetch_api: Make REST API requests
- google_search: Search the web using Google Custom Search API
- smart_command: Accepts natural language commands and auto-routes to the appropriate tool
Running as MCP Server
npm run start
The server communicates via stdio and exposes tools as defined in src/index.ts.
MCP Tool Example (fetch_webpage)
{
"name": "fetch_webpage",
"arguments": {
"url": "https://example.com",
"blockResources": true,
"timeout": 60000,
"maxLength": 10000
}
}
Google Search Integration
Set the following environment variables for Google Custom Search:
APIKEY_GOOGLE_SEARCH: Your Google API keyCX_GOOGLE_SEARCH: Your Custom Search Engine ID
<a name="configuration"></a>
🧩 Configuration
- Resource Blocking: Block images, stylesheets, and fonts for faster scraping.
- Timeouts: Set navigation and API request timeouts.
- Custom Headers: Pass custom HTTP headers for advanced scenarios.
- Authentication: Supports HTTP Basic Auth via username/password.
- Environment Variables: Used for Google Search API integration.
<a name="examples"></a>
💡 Examples
<details> <summary>Fetch Webpage Content</summary>
{
"name": "fetch_webpage",
"arguments": {
"url": "https://en.wikipedia.org/wiki/Web_scraping",
"blockResources": true,
"maxLength": 5000
}
}
</details>
<details> <summary>Make a REST API Request</summary>
{
"name": "fetch_api",
"arguments": {
"url": "https://api.github.com/repos/nodejs/node",
"method": "GET",
"headers": {
"Accept": "application/vnd.github.v3+json"
}
}
}
</details>
<details> <summary>Google Search</summary>
{
"name": "google_search",
"arguments": {
"query": "web scraping best practices",
"num": 5
}
}
</details>
<a name="troubleshooting"></a>
🛠️ Troubleshooting
- Timeout Errors: Increase the
timeoutparameter if requests are timing out. - Blocked Content: If content is missing, try disabling resource blocking or adjusting
resourceTypesToBlock. - Google Search Fails: Ensure
APIKEY_GOOGLE_SEARCHandCX_GOOGLE_SEARCHare set in your environment. - Binary/Unknown Content: Non-text responses are base64-encoded.
- Error Logs: Check the
logs/error-log.txtfile for detailed error messages.
<a name="tips--best-practices"></a>
🧠 Tips & Best Practices
<details> <summary>Click for advanced tips</summary>
- Use resource blocking for faster and lighter scraping unless you need images or styles.
- For large pages, use
maxLengthandstartIndexto paginate content extraction. - Always validate your tool arguments to avoid errors.
- Secure your API keys and sensitive data using environment variables.
- Review the MCP tool schemas in
src/index.tsfor all available options.
</details>
<a name="contributing--issues"></a>
🤝 Contributing & Issues
Contributions are welcome! If you want to contribute, fork this repository and submit a pull request.
If you find any issues or have suggestions, please open an issue on the repository page.
<a name="license--attribution"></a>
📄 License & Attribution
This project was developed by Rayss.
For questions, improvements, or contributions, please contact the author or open an issue in the repository.
Note: Google Search API is free with usage limits. For details, see: Google Custom Search API Overview
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.