Docs Vector MCP
Enables AI agents to semantically search GitHub repository documentation by automatically fetching, vectorizing, and indexing content into an Upstash Vector database. It provides a standard MCP interface for agents to retrieve relevant documentation snippets through natural language queries.
README
Docs Vector MCP
Vectorize GitHub tool documentation and provide MCP (Model Control Protocol) interface for AI Agents.
Features
- π Auto-fetch from GitHub - Automatically crawls and extracts documentation from GitHub repositories
- π§ Vector Embeddings - Uses OpenAI embeddings to store documentation in vector database
- π Semantic Search - Find relevant documentation using natural language queries
- π MCP Protocol - Standard Model Control Protocol interface for AI Agents
- π¨ Modern Web UI - Built with Next.js 15 + TailwindCSS
Architecture
βββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββ
β GitHub Repo β β β Crawl Docs β β β Split Chunksβ β β Embedding β
βββββββββββββββ ββββββββββββββββ ββββββββββββββββ ββββββββββββββ
β
ββββββββββββββββ
β Vector DB β β Query ββββββββββββ
β (Upstash) β β Result β AI Agent β
ββββββββββββββββ ββββββββββββ
β
βββββββββββββ
β MCP API β
βββββββββββββ
Tech Stack
- Framework: Next.js 15 + TypeScript + TailwindCSS
- Vector Database: Upstash Vector (serverless, perfect for Cloudflare deployment)
- Embeddings: OpenAI text-embedding-3-small
- GitHub API: Octokit
- MCP: @modelcontextprotocol/sdk
Environment Variables
Create a .env.local file:
# GitHub (optional but recommended for higher rate limits)
GITHUB_TOKEN=your_github_token
# OpenAI
OPENAI_API_KEY=your_openai_api_key
# Upstash Vector
UPSTASH_VECTOR_RESTAR_URL=your_upstash_vector_url
UPSTASH_VECTOR_RESTAR_TOKEN=your_upstash_vector_token
Getting Started
Install dependencies
npm install
Run development server
npm run dev
Open http://localhost:3000 in your browser.
CLI Usage
Index a GitHub repository
npx tsx cli/index.ts index <owner> <repo> [branch]
Example:
npx tsx cli/index.ts index openai openai-python main
Search indexed documentation
npx tsx cli/index.ts search "how to use embeddings"
Show statistics
npx tsx cli/index.ts stats
Clear all indexed documents
npx tsx cli/index.ts clear
Start MCP server (for AI Agent connection)
npx tsx cli/index.ts mcp
MCP Integration
Add this configuration to your AI Agent that supports MCP:
{
"mcpServers": {
"docs-vector": {
"command": "node",
"args": [
"path/to/docs-vector-mcp/dist/cli/index.js",
"mcp"
],
"env": {
"OPENAI_API_KEY": "<your-openai-api-key>",
"UPSTASH_VECTOR_RESTAR_URL": "<your-upstash-url>",
"UPSTASH_VECTOR_RESTAR_TOKEN": "<your-upstash-token>"
}
}
}
}
Available MCP Tools
-
search_docs- Search documentation semantically- Parameters:
query(string): The search querylimit(number, optional): Maximum number of results (1-20, default 5)
- Parameters:
-
get_stats- Get statistics about stored documentation- No parameters
Deployment
Cloudflare Pages
This project is optimized for Cloudflare Pages deployment:
- Push your code to GitHub
- Connect your repository to Cloudflare Pages
- Set build command:
npm install && npx next build - Set output directory:
.next - Add all environment variables in Cloudflare dashboard
- Deploy!
CI/CD with GitHub Actions
A sample workflow is included in .github/workflows/deploy.yml that automatically deploys to Cloudflare Pages on every push to main branch.
Project Structure
docs-vector-mcp/
βββ app/ # Next.js app router
β βββ api/ # API routes
β β βββ index/ # Indexing endpoint
β β βββ search/ # Search endpoint
β β βββ stats/ # Stats endpoint
β βββ globals.css # Global styles
β βββ layout.tsx # Root layout
β βββ page.tsx # Home page
βββ components/ # React components
β βββ IndexForm.tsx # Repository indexing form
β βββ SearchForm.tsx # Search form
βββ lib/ # Core libraries
β βββ github.ts # GitHub fetcher
β βββ text-processor.ts # Text chunking
β βββ embedding.ts # Embedding generator
β βββ vector-store.ts # Vector storage
β βββ mcp-server.ts # MCP server
β βββ docs-service.ts # Service orchestrator
βββ cli/ # CLI entry
β βββ index.ts # CLI main
βββ .github/
β βββ workflows/ # GitHub Actions
βββ next.config.ts # Next.js config
βββ tailwind.config.ts # Tailwind config
βββ package.json # Dependencies
How It Works
- Add Repository: You input a GitHub repository that contains tool documentation
- Crawling: The system fetches all documentation files (.md, .mdx, .rst, .txt, etc.) from the repo
- Processing: Text is cleaned and split into overlapping chunks
- Embedding: OpenAI generates vector embeddings for each chunk
- Storage: Vectors are stored in Upstash Vector database
- Search: When an AI Agent asks a question, the query is embedded and similar documents are retrieved
- Response: Relevant documentation snippets are returned to the AI Agent for answering
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.