opencode-docs
Scrapes, stores, and searches documentation locally, enabling AI assistants to access and query documentation via MCP.
README
opencode-docs
A powerful MCP (Model Context Protocol) server for scraping, storing, and searching documentation locally. Built for use with OpenCode, Claude Desktop, and other MCP-compatible AI coding assistants.
Features
- Smart Scraping: Extracts main content from documentation pages with intelligent noise filtering
- Playwright Support: Optional browser-based scraping for JavaScript-rendered sites (React, Vue, Next.js, etc.)
- Recursive Crawling: Automatically discovers and follows internal links to build complete documentation sets
- Full-Text Search: Fast search across all stored documentation using FlexSearch
- OpenAPI/Swagger Import: Import API documentation from OpenAPI specs or Swagger UI pages
- Metadata Extraction: Captures descriptions, keywords, authors, and last-modified dates
- Update Detection: Re-scrape existing docs and see what changed
Table of Contents
- Installation
- Quick Start
- Configuration
- Available Tools
- Usage Examples
- Migrating to Another Device
- Docker
- Troubleshooting
- Development
- Changelog
Installation
Prerequisites
- Node.js 18+ (check with
node --version) - npm or pnpm
- Git (for cloning)
Step 1: Clone the Repository
git clone https://github.com/salmenkhelifi1/opencode-docs.git
cd opencode-docs
Step 2: Install Dependencies
npm install
Step 3: Build the Project
npm run build
Step 4 (Optional): Install Playwright for JS-rendered Sites
If you need to scrape JavaScript-heavy sites (React, Vue, Next.js docs, etc.):
# Install Playwright
npm install playwright
# Install Chromium browser
npx playwright install chromium
Verify Installation
# Test that the server starts
node dist/index.js
# You should see:
# [opencode-docs] Docs directory: /home/username/.config/opencode/docs
# [opencode-docs] MCP server started (v1.1.0)
# Press Ctrl+C to stop
Quick Start
After installation, add some documentation:
# Start your AI assistant (OpenCode, Claude Desktop, etc.)
# Then use these commands:
# Add Next.js documentation (recursive crawl)
docs_add_url url="https://nextjs.org/docs" recursive=true maxPages=30
# Add Express.js documentation
docs_add_url url="https://expressjs.com/en/starter/installing.html" recursive=true maxPages=30
# Search your docs
docs_search query="middleware"
# List all sources
docs_list
Configuration
OpenCode Setup
Step 1: Find Your Config File
The OpenCode config file is located at:
- Linux/macOS:
~/.config/opencode/opencode.json - Windows:
%APPDATA%\opencode\opencode.json
Step 2: Add the MCP Server
Add the docs MCP server to your config:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"docs": {
"type": "local",
"command": ["node", "/full/path/to/opencode-docs/dist/index.js"],
"enabled": true
}
}
}
Important: Replace /full/path/to/opencode-docs with the actual path where you cloned the repository.
Step 3: Restart OpenCode
Restart OpenCode to load the new MCP server. You should see the docs tools available.
Full OpenCode Config Example
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"docs": {
"type": "local",
"command": ["node", "/home/username/opencode-docs/dist/index.js"],
"enabled": true
}
}
}
Claude Desktop Setup
Step 1: Find Your Config File
The Claude Desktop config file is located at:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/Claude/claude_desktop_config.json
Step 2: Add the MCP Server
{
"mcpServers": {
"docs": {
"command": "node",
"args": ["/full/path/to/opencode-docs/dist/index.js"]
}
}
}
Step 3: Restart Claude Desktop
Quit and restart Claude Desktop. The docs tools should now be available.
VS Code with Continue Extension
Add to your Continue config (.continue/config.json):
{
"experimental": {
"modelContextProtocolServers": [
{
"transport": {
"type": "stdio",
"command": "node",
"args": ["/full/path/to/opencode-docs/dist/index.js"]
}
}
]
}
}
Available Tools
docs_list
List all available documentation sources stored locally.
docs_list
Output: Shows all sources with their IDs, page counts, and descriptions.
docs_search
Search across all local documentation.
docs_search query="authentication"
docs_search query="routing" sourceId="expressjs"
docs_search query="hooks" limit=10
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | Yes | Search query |
sourceId |
string | No | Limit search to specific source |
limit |
number | No | Max results (default: 10) |
docs_read
Read a specific documentation page or list all pages in a source.
# List all pages in a source
docs_read sourceId="nextjs"
# Read a specific page
docs_read sourceId="nextjs" pagePath="docs-app-getting-started.md"
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
sourceId |
string | Yes | Source ID |
pagePath |
string | No | Page path (omit to list all pages) |
docs_add_url
Add documentation from a URL with optional recursive crawling.
# Single page
docs_add_url url="https://nextjs.org/docs"
# Recursive crawl (follows links)
docs_add_url url="https://nextjs.org/docs" recursive=true maxPages=50 maxDepth=3
# For JavaScript-rendered sites
docs_add_url url="https://react.dev/learn" usePlaywright=true recursive=true
# With URL filter pattern
docs_add_url url="https://docs.example.com" recursive=true urlPattern="/api/"
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string | required | The URL to scrape |
sourceId |
string | auto | Custom source ID |
name |
string | auto | Display name for the source |
description |
string | auto | Description for the source |
usePlaywright |
boolean | false | Use Playwright for JS-rendered pages |
recursive |
boolean | false | Recursively crawl linked pages |
maxPages |
number | 20 | Max pages to crawl (recursive mode) |
maxDepth |
number | 2 | Max link depth (recursive mode) |
urlPattern |
string | - | Regex pattern to filter URLs |
docs_add_sitemap
Crawl an entire documentation site from its sitemap.xml.
docs_add_sitemap sitemapUrl="https://docs.example.com/sitemap.xml"
docs_add_sitemap sitemapUrl="https://docs.example.com/sitemap.xml" maxPages=100 urlPattern="/docs/"
Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
sitemapUrl |
string | required | Sitemap URL |
sourceId |
string | auto | Custom source ID |
name |
string | auto | Display name |
maxPages |
number | 50 | Max pages to crawl |
urlPattern |
string | - | Regex to filter URLs |
docs_add_openapi
Import an OpenAPI/Swagger specification from a direct JSON URL.
docs_add_openapi url="https://api.example.com/openapi.json" sourceId="my-api"
docs_add_swagger
Import documentation from a Swagger UI page (auto-detects spec URL).
docs_add_swagger url="https://api.example.com/swagger"
docs_update
Update/refresh existing documentation by re-scraping.
# Update entire source
docs_update sourceId="nextjs"
# Update single page
docs_update sourceId="nextjs" pagePath="docs.md"
# With Playwright
docs_update sourceId="react" usePlaywright=true
docs_preview
Preview scraped content without saving.
docs_preview url="https://example.com/docs"
docs_preview url="https://example.com/docs" showLinks=true usePlaywright=true
docs_auth
Manage authentication credentials for API documentation.
# Add bearer token
docs_auth action="add" host="api.example.com" type="bearer" token="your-token"
# Add basic auth
docs_auth action="add" host="api.example.com" type="basic" username="user" password="pass"
# List all credentials
docs_auth action="list"
# Remove credentials
docs_auth action="remove" host="api.example.com"
docs_remove
Remove a documentation source and all its pages.
docs_remove sourceId="old-docs" confirm=true
Usage Examples
Add Popular Documentation Sites
# Next.js (React framework)
docs_add_url url="https://nextjs.org/docs" recursive=true maxPages=50 sourceId="nextjs" name="Next.js"
# Express.js (Node.js web framework)
docs_add_url url="https://expressjs.com/en/starter/installing.html" recursive=true maxPages=30 sourceId="expressjs" name="Express.js"
# Node.js API Documentation
docs_add_url url="https://nodejs.org/docs/latest/api/" recursive=true maxPages=40 sourceId="nodejs" name="Node.js"
# n8n (Workflow Automation)
docs_add_url url="https://docs.n8n.io/" recursive=true maxPages=30 sourceId="n8n" name="n8n"
# React (needs Playwright for JS rendering)
docs_add_url url="https://react.dev/learn" usePlaywright=true recursive=true maxPages=30 sourceId="react" name="React"
# Vue.js
docs_add_url url="https://vuejs.org/guide/introduction.html" recursive=true maxPages=30 sourceId="vuejs" name="Vue.js"
# Tailwind CSS
docs_add_url url="https://tailwindcss.com/docs/installation" recursive=true maxPages=50 sourceId="tailwind" name="Tailwind CSS"
Search Examples
# Search all documentation
docs_search query="authentication"
# Search specific source
docs_search query="middleware" sourceId="expressjs"
# Search with limit
docs_search query="hooks" limit=5
# Search for error handling
docs_search query="error handling"
Import API Documentation
# From OpenAPI JSON
docs_add_openapi url="https://petstore.swagger.io/v2/swagger.json" sourceId="petstore"
# From Swagger UI page
docs_add_swagger url="https://api.example.com/swagger-ui"
# With authentication
docs_auth action="add" host="api.mycompany.com" type="bearer" token="my-api-key"
docs_add_openapi url="https://api.mycompany.com/openapi.json" sourceId="internal-api"
Migrating to Another Device
Option 1: Copy Documentation (Recommended)
Copy the entire docs directory to your new device:
# On old device - compress docs
cd ~/.config/opencode
tar -czvf docs-backup.tar.gz docs/
# Transfer docs-backup.tar.gz to new device
# On new device - extract docs
mkdir -p ~/.config/opencode
cd ~/.config/opencode
tar -xzvf docs-backup.tar.gz
Option 2: Re-scrape Documentation
On the new device, after installation:
# Re-add all your documentation sources
docs_add_url url="https://nextjs.org/docs" recursive=true maxPages=50
docs_add_url url="https://expressjs.com/en/starter/installing.html" recursive=true maxPages=30
# ... etc
Full Migration Checklist
-
Clone the repository on the new device:
git clone https://github.com/salmenkhelifi1/opencode-docs.git cd opencode-docs npm install npm run build -
Copy configuration (optional, for credentials):
# Copy credentials file if you have API auth saved scp old-device:~/.config/opencode/docs/credentials.json ~/.config/opencode/docs/ -
Copy documentation or re-scrape:
# Copy existing docs scp -r old-device:~/.config/opencode/docs ~/.config/opencode/ # OR re-scrape (see examples above) -
Configure your AI assistant (OpenCode, Claude Desktop, etc.)
-
Test:
docs_list docs_search query="test"
Storage Location
Documentation is stored in ~/.config/opencode/docs/:
~/.config/opencode/docs/
├── manifest.json # Index of all sources and pages
├── credentials.json # Saved API credentials (if any)
├── nextjs/ # Source directory
│ ├── docs.md
│ ├── docs-app-getting-started.md
│ └── ...
├── expressjs/
│ └── ...
└── nodejs/
└── ...
Supported Documentation Sites
The scraper includes optimized selectors for:
| Framework | Notes |
|---|---|
| Docusaurus | React docs, many OSS projects |
| Nextra | Next.js docs |
| GitBook | Many startups use this |
| ReadTheDocs | Python projects |
| VuePress/VitePress | Vue.js ecosystem |
| MkDocs | Material for MkDocs |
| Generic HTML | Works with most sites |
For JavaScript-heavy sites, enable Playwright with usePlaywright=true.
Docker
Build the Image
docker build -t opencode-docs .
Run with Volume Mount
docker run -v ~/.config/opencode/docs:/root/.config/opencode/docs opencode-docs
Docker Compose
version: '3.8'
services:
opencode-docs:
build: .
volumes:
- ~/.config/opencode/docs:/root/.config/opencode/docs
stdin_open: true
tty: true
Troubleshooting
Common Issues
"Playwright is not installed"
npm install playwright
npx playwright install chromium
"Failed to fetch URL: 403 Forbidden"
Some sites block scrapers. Try:
- Using Playwright:
usePlaywright=true - Adding a delay between requests (automatic in recursive mode)
"No content extracted"
The site might use JavaScript rendering. Try:
docs_add_url url="..." usePlaywright=true
"Command not found: docs_list"
The MCP server isn't configured. Check:
- The path in your config is correct
- The project is built (
npm run build) - Restart your AI assistant
Docs directory not found
Create it manually:
mkdir -p ~/.config/opencode/docs
Debug Mode
Run the server directly to see logs:
node /path/to/opencode-docs/dist/index.js
Development
Run in Development Mode
npm run dev
Watch Mode
npm run watch
Clean Build
npm run clean && npm run build
Project Structure
opencode-docs/
├── src/
│ ├── index.ts # MCP server entry point
│ ├── types.ts # TypeScript types
│ ├── services/
│ │ ├── scraper.ts # HTML to Markdown conversion
│ │ ├── crawler.ts # Sitemap and recursive crawling
│ │ ├── storage.ts # File system management
│ │ ├── search.ts # FlexSearch integration
│ │ └── credentials.ts # Auth credential management
│ └── tools/
│ ├── docs-add-url.ts
│ ├── docs-add-sitemap.ts
│ ├── docs-search.ts
│ └── ... (other tools)
├── dist/ # Compiled JavaScript
├── package.json
├── tsconfig.json
└── README.md
License
MIT
Contributing
Contributions are welcome! Please open an issue or submit a pull request.
Changelog
v1.1.0
- Added Playwright support for JS-rendered pages
- Added recursive crawling with link discovery
- Added
docs_updatetool for refreshing documentation - Added
docs_previewtool for testing scrapes - Enhanced content selectors for Docusaurus, Nextra, GitBook, etc.
- Smart content detection with text density scoring
- Improved noise filtering (removes nav, breadcrumbs, edit links)
- Metadata extraction (description, keywords, author, lastModified)
- Title deduplication
v1.0.0
- Initial release
- Basic scraping with cheerio
- Sitemap crawling
- OpenAPI/Swagger import
- Full-text search
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.