MCP Servers

RAG Document Server

Enables semantic search and question-answering over uploaded documents using vector embeddings and Google AI. Supports document organization with tags, section-aware queries, and hierarchical markdown structure preservation.

README

RAG Server with MCP Integration

A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.

Features

Core Capabilities

Document Storage: Upload and store text (.txt) and Markdown (.md) documents
Hierarchical Chunking: Structure-aware chunking for markdown that preserves document hierarchy
Vector Search: Efficient similarity search using Qdrant vector database
Google AI Integration: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)
REST API: FastAPI-based REST API with automatic OpenAPI documentation
MCP Server: Model Context Protocol server for seamless integration with Claude and other MCP clients
OpenAI-Compatible API: Supports OpenAI-compatible chat completions for web UI integration
Code Indexing: Index and search source code repositories with semantic understanding
Smart Query Routing: Automatic query classification and routing to appropriate retrieval methods

Advanced Features

Tag-Based Organization: Organize documents with multiple tags for easy categorization
Section-Aware Retrieval: Query specific sections of documentation (e.g., "Installation > Prerequisites")
Markdown Structure Preservation: Automatic extraction of heading hierarchy with breadcrumb paths
Context-Enhanced Answers: LLM receives section context for more accurate responses
Flexible Filtering: Filter documents by tags and/or section paths during queries
Document Structure API: Explore table of contents and section organization
GitHub Integration: Parse and extract content from GitHub URLs
Reference Following: Automatically follow documentation references for comprehensive answers
Multi-Mode Retrieval: Choose between standard, enhanced, or smart query modes
Rate Limiting: Built-in rate limiting for API endpoints

Project Structure

mcp-rag-docs/
   config/
      __init__.py
      settings.py                # Configuration and settings
   rag_server/
      __init__.py
      models.py                  # Pydantic models for API
      openai_api.py              # OpenAI-compatible API endpoints
      openai_models.py           # OpenAI API models
      rag_system.py              # Core RAG system logic
      server.py                  # FastAPI server
      smart_query.py             # Smart query routing
   mcp_server/
      __init__.py
      server.py                  # MCP server implementation
   utils/
      __init__.py
      code_indexer.py            # Source code indexing
      code_index_store.py        # Code index storage
      document_processor.py      # Document processing
      embeddings.py              # Google AI embeddings
      frontmatter_parser.py      # YAML frontmatter parsing
      github_parser.py           # GitHub URL parsing
      google_api_client.py       # Google AI API client
      hierarchical_chunker.py    # Hierarchical document chunking
      markdown_parser.py         # Markdown parsing
      query_classifier.py        # Query type classification
      rate_limit_store.py        # Rate limiting
      reference_extractor.py     # Extract doc references
      retrieval_router.py        # Multi-mode retrieval routing
      source_extractor.py        # Extract source code snippets
      text_chunker.py            # Text chunking utility
      vector_store.py            # Qdrant vector store wrapper
   build_code_index.py          # Build code index from repository
   check_github_urls.py         # Validate GitHub URLs
   check_status.py              # System status checker
   example_usage.py             # Example usage scripts
   ingest_docs.py               # Document ingestion utility
   main.py                      # Main entry point
   .env.example                 # Example environment variables
   docker-compose.yml           # Docker setup for Qdrant
   pyproject.toml               # Project dependencies

Installation

Prerequisites

Python 3.13 or higher
Google AI Studio API key (Get one here)

Setup

Clone or navigate to the project directory
Install dependencies

# Using pip
pip install -e .

# Or using uv (recommended)
uv pip install -e .

Configure environment variables

# Copy the example env file
cp .env.example .env

# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here

Start Qdrant (optional - using Docker)

docker-compose up -d

Usage

Running the FastAPI Server

Start the REST API server:

python -m rag_server.server

The server will start at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.

API Endpoints

Core Endpoints:

POST /documents - Upload a document
POST /query - Query the RAG system (standard mode)
POST /query-enhanced - Query with automatic reference following
POST /smart-query - Smart query with automatic routing
GET /documents - List all documents
DELETE /documents/{doc_id} - Delete a document
GET /stats - Get system statistics
GET /health - Health check
GET /tags - List all available tags
GET /documents/{doc_id}/sections - Get document structure

OpenAI-Compatible Endpoints:

POST /v1/chat/completions - OpenAI-compatible chat completions
GET /v1/models - List available models

Example Usage with curl

# Upload a document
curl -X POST "http://localhost:8000/documents" \
  -F "file=@example.txt"

# Upload with tags
curl -X POST "http://localhost:8000/documents" \
  -F "file=@dagster-docs.md" \
  -F "tags=dagster,python,orchestration"

# Query the RAG system
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the main topic of the documents?", "top_k": 5}'

# Smart query with automatic routing
curl -X POST "http://localhost:8000/smart-query" \
  -H "Content-Type: application/json" \
  -d '{"question": "How do I create a Dagster asset?"}'

# OpenAI-compatible chat completion
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rag-smart",
    "messages": [{"role": "user", "content": "What is an asset in Dagster?"}],
    "stream": false
  }'

# List documents
curl "http://localhost:8000/documents"

# Get statistics
curl "http://localhost:8000/stats"

Running the MCP Server

The MCP server allows integration with Claude and other MCP-compatible clients.

python -m mcp_server.server

MCP Tools Available

query_rag - Query the RAG system with a question
query_rag_enhanced - Query with automatic reference following
smart_query - Smart query with automatic routing and classification
add_document - Add a document to the RAG system
list_documents - List all stored documents
delete_document - Delete a document by ID
get_rag_stats - Get system statistics
get_tags - List all available tags
get_document_structure - Get document table of contents

Using with Claude Desktop

Add to your Claude Desktop configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "rag": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/mcp-rag-docs",
        "run",
        "python",
        "-m",
        "mcp_server.server"
      ]
    }
  }
}

See QUICK_START.md for a quick setup guide.

Configuration

All configuration is managed through environment variables (defined in .env):

Variable	Description	Default
`GOOGLE_API_KEY`	Google AI Studio API key	(required)
`CHUNK_SIZE`	Size of text chunks in characters	1000
`CHUNK_OVERLAP`	Overlap between chunks	200
`TOP_K_RESULTS`	Number of chunks to retrieve	5
`QDRANT_PATH`	Path to Qdrant storage	./qdrant_storage
`QDRANT_COLLECTION_NAME`	Qdrant collection name	documents
`FASTAPI_HOST`	FastAPI server host	0.0.0.0
`FASTAPI_PORT`	FastAPI server port	8000
`EMBEDDING_MODEL`	Google embedding model	text-embedding-004
`LLM_MODEL`	Google LLM model	gemini-1.5-flash

Architecture

Document Processing Pipeline

Upload - User uploads a .txt or .md file
Processing - Document is read and metadata extracted (including frontmatter)
Chunking - Text is split using hierarchical chunking for markdown or standard chunking for text
Embedding - Each chunk is converted to a vector using Google AI embeddings
Storage - Vectors and metadata are stored in Qdrant

Query Pipeline

Standard Query

Query - User submits a question
Embedding - Question is converted to a vector
Retrieval - Similar chunks are retrieved from Qdrant
Generation - Context is provided to Google AI Studio model
Response - Answer is generated and returned with sources

Smart Query

Classification - Query is classified (documentation, code, conceptual, etc.)
Routing - Automatically selects best retrieval strategy
Multi-Source - May combine documentation search, code search, and direct answers
Synthesis - Generates comprehensive answer from multiple sources

Code Indexing

The system can index source code repositories:

# Build code index
python build_code_index.py /path/to/repo

# Query code through the API or MCP server

Code is indexed with:

Class and function definitions
Docstrings and comments
File structure and imports
Semantic embeddings for natural language queries

Development

Running Tests

# Install test dependencies
pip install pytest pytest-asyncio httpx

# Run tests
pytest

# Run specific test files
pytest test_openai_api.py
pytest test_mcp_integration.py

Code Style

The project follows Python best practices with type hints and docstrings.

Troubleshooting

Common Issues

Issue: GOOGLE_API_KEY not found

Solution: Ensure you've created a .env file and added your Google API key

Issue: Unsupported file type

Solution: Only .txt and .md files are supported. Convert other formats first.

Issue: Collection already exists error

Solution: Delete the qdrant_storage/ directory to reset the database

Issue: MCP server not connecting

Solution: Check that the path in your MCP config is correct and the .env file is in the project root

Advanced Usage

Tag-Based Organization

Organize your documents with tags for easy categorization and filtering:

# Upload document with tags
curl -X POST "http://localhost:8000/documents" \
  -F "file=@dagster-docs.md" \
  -F "tags=dagster,python,orchestration"

# List all available tags
curl "http://localhost:8000/tags"

# Query only dagster-related documents
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}'

# List documents filtered by tags
curl "http://localhost:8000/documents?tags=dagster,python"

Hierarchical Document Structure

For markdown documents, the system automatically preserves heading hierarchy:

# Get document structure (table of contents)
curl "http://localhost:8000/documents/{doc_id}/sections"

# Query specific section
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}'

Section-Aware Queries

The system includes section context when generating answers:

# Example: Markdown document structure
# Installation
#   Prerequisites
#     Python Version
#   Setup Steps

# When you query about "Python version requirements"
# The system will:
# 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version"
# 2. Include section path in context sent to LLM
# 3. Cite sources with full section paths

Smart Query Modes

The system supports three query modes:

Standard (/query) - Basic vector search and retrieval
Enhanced (/query-enhanced) - Follows documentation references automatically
Smart (/smart-query) - Automatic classification and routing

Use the OpenAI-compatible API to access different modes:

# Standard mode
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-standard", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

# Enhanced mode with reference following
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-enhanced", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

# Smart mode with automatic routing
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-smart", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

MCP Tools

The MCP server provides enhanced tools for Claude and other MCP clients:

query_rag - Query with optional tags and section filtering

{
  "question": "How do I deploy?",
  "tags": ["dagster"],
  "section_path": "Deployment"
}

smart_query - Smart query with automatic routing

{
  "question": "What is an asset and how do I use it?"
}

add_document - Upload with tags

{
  "file_path": "/path/to/doc.md",
  "tags": ["dagster", "docs"]
}

get_tags - List all tags

get_document_structure - Get table of contents

{
  "doc_id": "abc123"
}

API Reference

Enhanced Endpoints

POST /documents

Body: file (multipart), tags (comma-separated string)
Response: Document info with tags and chunk count

POST /query

Body: {"question": "...", "tags": [...], "section_path": "..."}
Response: Answer with section-aware sources

POST /smart-query

Body: {"question": "..."}
Response: Smart answer with automatic routing and classification

GET /tags

Response: {"tags": [...], "total": N}

GET /documents/{doc_id}/sections

Response: Document structure with section hierarchy

GET /documents?tags=tag1,tag2

Query filtered by tags
Response: List of matching documents

POST /v1/chat/completions

OpenAI-compatible chat completion endpoint
Supports models: rag-standard, rag-enhanced, rag-smart
Supports streaming with stream: true

GET /v1/models

List available RAG models

Additional Documentation

QUICK_START.md - Quick setup guide for MCP integration
MCP_SETUP.md - Detailed MCP server setup
OPENAI_API_GUIDE.md - OpenAI-compatible API documentation
QUERY_ROUTING_GUIDE.md - Smart query routing guide
MULTI_MODE_RETRIEVAL_GUIDE.md - Multi-mode retrieval documentation
CODE_INDEX_GUIDE.md - Code indexing and search guide
RATE_LIMITING.md - Rate limiting configuration
TEST_COVERAGE.md - Test coverage and testing guide

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

Google AI Studio for embeddings and LLM capabilities
Qdrant for vector database
FastAPI for the REST API framework
Anthropic MCP for the Model Context Protocol

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured