LocalDocs MCP
Creates a local database of indexed technical documentation from web crawls and local files, enabling AI agents to efficiently search and retrieve documentation through MCP tools.
README
LocalDocs MCP
A Model Context Protocol (MCP) server that creates a local database of indexed and optimized technical documentation. It enables AI agents to efficiently query, search, and retrieve documentation from both web sources and local files through MCP tools.
Features
- Web Crawling: Automatically crawl and index documentation websites
- Local File Indexing: Process local markdown documentation
- AI-Powered Processing: Optional AI enhancement for metadata extraction and example generation
- Smart Search: Fuzzy search and semantic retrieval capabilities
- Efficient Storage: Folder-based markdown storage with frontmatter metadata
- MCP Integration: Full MCP protocol support for AI agent interaction
- Async Architecture: Fast, concurrent processing throughout
Installation
# Install from source
git clone https://github.com/dylan-gluck/localdocs-mcp
cd localdocs-mcp
uv sync
# Run directly with uvx (coming soon)
# uvx localdocs-mcp
Quick Start
1. Initialize a Documentation Collection
# Crawl web documentation
localdocs init react --crawl https://react.dev/learn --depth 2
# Index local files
localdocs init myproject --local ~/Documents/myproject/docs
# With AI processing (requires OpenAI API key)
localdocs init vue --crawl https://vuejs.org/guide/ --ai
2. Search Documentation
# Search across all collections
localdocs search "useState hook"
# Search specific collection
localdocs search "component props" --collection react
# List all collections
localdocs list
3. Configure MCP Client
Add to your Claude Desktop configuration (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"localdocs": {
"command": "uvx",
"args": ["localdocs-mcp", "serve"],
"env": {
"OPENAI_API_KEY": "${OPENAI_API_KEY}" // Optional, for AI processing
}
}
}
}
MCP Tools
The server exposes the following tools to AI agents:
| Tool | Description | Parameters |
|---|---|---|
search_docs |
Search across all documentation | query, collection?, limit? |
list_collections |
List available collections | - |
get_document |
Get specific document by ID | doc_id |
list_examples |
List code examples | collection?, language? |
fuzzy_find |
Fuzzy search documents | pattern, collection? |
CLI Commands
Collection Management
# Initialize new collection
localdocs init <name> --crawl <url> [--depth N] [--ai]
localdocs init <name> --local <path> [--ai]
# List collections
localdocs list
# Update existing collection
localdocs update <name>
# Delete collection
localdocs delete <name>
Document Operations
# Search documents
localdocs search <query> [--collection NAME] [--limit N]
# Show specific document
localdocs show <doc-id>
# Get statistics
localdocs stats [--collection NAME]
MCP Server
# Start MCP server (stdio transport)
localdocs serve
# Start with HTTP transport (coming soon)
localdocs serve --port 8080
Configuration
LocalDocs stores configuration in ~/.localdocs-mcp/config.yaml:
storage_path: ~/.localdocs-mcp
default_collection: main
crawl_defaults:
depth: 2
word_count_threshold: 50
excluded_tags: [nav, footer, header]
cache_enabled: true
processing:
chunk_size: 2000
overlap: 200
generate_examples: true
baml:
model: gpt-4o-mini
temperature: 0.3
Development
# Install dependencies
uv sync
# Run tests
uv run pytest tests/
# Run specific test file
uv run pytest tests/test_storage.py -v
# Lint and format code
uvx ruff check .
uvx ruff format .
# Type checking
uv run mypy localdocs
Architecture
LocalDocs follows a modular architecture:
- CLI Layer: Typer-based command interface
- Processing Layer: Web crawling (Crawl4ai) and document processing
- Storage Layer: File-based storage with markdown and frontmatter
- MCP Layer: FastMCP server implementation
- AI Layer: Optional BAML integration for enhanced processing
Storage Format
Documents are stored as markdown files with YAML frontmatter:
---
id: "uuid-here"
collection: "react"
source_url: "https://react.dev/learn/thinking-in-react"
title: "Thinking in React"
chunk: 1
total_chunks: 3
tags: ["react", "component", "state"]
created: 2025-09-04
examples_generated: true
---
# Thinking in React (Part 1/3)
[Document content here]
## Generated Examples
[Example code blocks]
Environment Variables
OPENAI_API_KEY: Required for AI-powered processing featuresANTHROPIC_API_KEY: Alternative AI provider for processingLOCALDOCS_PATH: Override default storage path
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see LICENSE file for details
Roadmap
- [ ] Vector embeddings for semantic search
- [ ] Support for more file types (PDF, docx)
- [ ] HTTP transport option for MCP
- [ ] Incremental indexing
- [ ] Web UI for document browsing
- [ ] Custom BAML prompts
- [ ] Multi-language code detection improvements
Acknowledgments
Built with:
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.