Docs Vector MCP

Docs Vector MCP

Enables AI agents to semantically search GitHub repository documentation by automatically fetching, vectorizing, and indexing content into an Upstash Vector database. It provides a standard MCP interface for agents to retrieve relevant documentation snippets through natural language queries.

Category
Visit Server

README

Docs Vector MCP

Vectorize GitHub tool documentation and provide MCP (Model Control Protocol) interface for AI Agents.

Features

  • πŸ”„ Auto-fetch from GitHub - Automatically crawls and extracts documentation from GitHub repositories
  • 🧠 Vector Embeddings - Uses OpenAI embeddings to store documentation in vector database
  • πŸ” Semantic Search - Find relevant documentation using natural language queries
  • πŸ”Œ MCP Protocol - Standard Model Control Protocol interface for AI Agents
  • 🎨 Modern Web UI - Built with Next.js 15 + TailwindCSS

Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ GitHub Repo β”‚ β†’  β”‚  Crawl Docs  β”‚ β†’  β”‚ Split Chunksβ”‚ β†’  β”‚  Embedding β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          ↓
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚ Vector DB    β”‚ ←  Query  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  (Upstash)   β”‚ β†’  Result β”‚ AI Agent β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          ↑
                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                     β”‚  MCP API  β”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Tech Stack

  • Framework: Next.js 15 + TypeScript + TailwindCSS
  • Vector Database: Upstash Vector (serverless, perfect for Cloudflare deployment)
  • Embeddings: OpenAI text-embedding-3-small
  • GitHub API: Octokit
  • MCP: @modelcontextprotocol/sdk

Environment Variables

Create a .env.local file:

# GitHub (optional but recommended for higher rate limits)
GITHUB_TOKEN=your_github_token

# OpenAI
OPENAI_API_KEY=your_openai_api_key

# Upstash Vector
UPSTASH_VECTOR_RESTAR_URL=your_upstash_vector_url
UPSTASH_VECTOR_RESTAR_TOKEN=your_upstash_vector_token

Getting Started

Install dependencies

npm install

Run development server

npm run dev

Open http://localhost:3000 in your browser.

CLI Usage

Index a GitHub repository

npx tsx cli/index.ts index <owner> <repo> [branch]

Example:

npx tsx cli/index.ts index openai openai-python main

Search indexed documentation

npx tsx cli/index.ts search "how to use embeddings"

Show statistics

npx tsx cli/index.ts stats

Clear all indexed documents

npx tsx cli/index.ts clear

Start MCP server (for AI Agent connection)

npx tsx cli/index.ts mcp

MCP Integration

Add this configuration to your AI Agent that supports MCP:

{
  "mcpServers": {
    "docs-vector": {
      "command": "node",
      "args": [
        "path/to/docs-vector-mcp/dist/cli/index.js",
        "mcp"
      ],
      "env": {
        "OPENAI_API_KEY": "<your-openai-api-key>",
        "UPSTASH_VECTOR_RESTAR_URL": "<your-upstash-url>",
        "UPSTASH_VECTOR_RESTAR_TOKEN": "<your-upstash-token>"
      }
    }
  }
}

Available MCP Tools

  1. search_docs - Search documentation semantically

    • Parameters:
      • query (string): The search query
      • limit (number, optional): Maximum number of results (1-20, default 5)
  2. get_stats - Get statistics about stored documentation

    • No parameters

Deployment

Cloudflare Pages

This project is optimized for Cloudflare Pages deployment:

  1. Push your code to GitHub
  2. Connect your repository to Cloudflare Pages
  3. Set build command: npm install && npx next build
  4. Set output directory: .next
  5. Add all environment variables in Cloudflare dashboard
  6. Deploy!

CI/CD with GitHub Actions

A sample workflow is included in .github/workflows/deploy.yml that automatically deploys to Cloudflare Pages on every push to main branch.

Project Structure

docs-vector-mcp/
β”œβ”€β”€ app/                    # Next.js app router
β”‚   β”œβ”€β”€ api/               # API routes
β”‚   β”‚   β”œβ”€β”€ index/         # Indexing endpoint
β”‚   β”‚   β”œβ”€β”€ search/        # Search endpoint
β”‚   β”‚   └── stats/         # Stats endpoint
β”‚   β”œβ”€β”€ globals.css        # Global styles
β”‚   β”œβ”€β”€ layout.tsx         # Root layout
β”‚   └── page.tsx           # Home page
β”œβ”€β”€ components/            # React components
β”‚   β”œβ”€β”€ IndexForm.tsx      # Repository indexing form
β”‚   └── SearchForm.tsx     # Search form
β”œβ”€β”€ lib/                   # Core libraries
β”‚   β”œβ”€β”€ github.ts          # GitHub fetcher
β”‚   β”œβ”€β”€ text-processor.ts  # Text chunking
β”‚   β”œβ”€β”€ embedding.ts       # Embedding generator
β”‚   β”œβ”€β”€ vector-store.ts    # Vector storage
β”‚   β”œβ”€β”€ mcp-server.ts      # MCP server
β”‚   └── docs-service.ts    # Service orchestrator
β”œβ”€β”€ cli/                   # CLI entry
β”‚   └── index.ts           # CLI main
β”œβ”€β”€ .github/
β”‚   └── workflows/         # GitHub Actions
β”œβ”€β”€ next.config.ts         # Next.js config
β”œβ”€β”€ tailwind.config.ts     # Tailwind config
└── package.json           # Dependencies

How It Works

  1. Add Repository: You input a GitHub repository that contains tool documentation
  2. Crawling: The system fetches all documentation files (.md, .mdx, .rst, .txt, etc.) from the repo
  3. Processing: Text is cleaned and split into overlapping chunks
  4. Embedding: OpenAI generates vector embeddings for each chunk
  5. Storage: Vectors are stored in Upstash Vector database
  6. Search: When an AI Agent asks a question, the query is embedded and similar documents are retrieved
  7. Response: Relevant documentation snippets are returned to the AI Agent for answering

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured