RAG Document Server

RAG Document Server

Enables semantic search and question-answering over uploaded documents using vector embeddings and Google AI. Supports document organization with tags, section-aware queries, and hierarchical markdown structure preservation.

Category
Visit Server

README

RAG Server with MCP Integration

A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.

Features

Core Capabilities

  • Document Storage: Upload and store text (.txt) and Markdown (.md) documents
  • Hierarchical Chunking: Structure-aware chunking for markdown that preserves document hierarchy
  • Vector Search: Efficient similarity search using Qdrant vector database
  • Google AI Integration: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)
  • REST API: FastAPI-based REST API with automatic OpenAPI documentation
  • MCP Server: Model Context Protocol server for seamless integration with Claude and other MCP clients
  • OpenAI-Compatible API: Supports OpenAI-compatible chat completions for web UI integration
  • Code Indexing: Index and search source code repositories with semantic understanding
  • Smart Query Routing: Automatic query classification and routing to appropriate retrieval methods

Advanced Features

  • Tag-Based Organization: Organize documents with multiple tags for easy categorization
  • Section-Aware Retrieval: Query specific sections of documentation (e.g., "Installation > Prerequisites")
  • Markdown Structure Preservation: Automatic extraction of heading hierarchy with breadcrumb paths
  • Context-Enhanced Answers: LLM receives section context for more accurate responses
  • Flexible Filtering: Filter documents by tags and/or section paths during queries
  • Document Structure API: Explore table of contents and section organization
  • GitHub Integration: Parse and extract content from GitHub URLs
  • Reference Following: Automatically follow documentation references for comprehensive answers
  • Multi-Mode Retrieval: Choose between standard, enhanced, or smart query modes
  • Rate Limiting: Built-in rate limiting for API endpoints

Project Structure

mcp-rag-docs/
   config/
      __init__.py
      settings.py                # Configuration and settings
   rag_server/
      __init__.py
      models.py                  # Pydantic models for API
      openai_api.py              # OpenAI-compatible API endpoints
      openai_models.py           # OpenAI API models
      rag_system.py              # Core RAG system logic
      server.py                  # FastAPI server
      smart_query.py             # Smart query routing
   mcp_server/
      __init__.py
      server.py                  # MCP server implementation
   utils/
      __init__.py
      code_indexer.py            # Source code indexing
      code_index_store.py        # Code index storage
      document_processor.py      # Document processing
      embeddings.py              # Google AI embeddings
      frontmatter_parser.py      # YAML frontmatter parsing
      github_parser.py           # GitHub URL parsing
      google_api_client.py       # Google AI API client
      hierarchical_chunker.py    # Hierarchical document chunking
      markdown_parser.py         # Markdown parsing
      query_classifier.py        # Query type classification
      rate_limit_store.py        # Rate limiting
      reference_extractor.py     # Extract doc references
      retrieval_router.py        # Multi-mode retrieval routing
      source_extractor.py        # Extract source code snippets
      text_chunker.py            # Text chunking utility
      vector_store.py            # Qdrant vector store wrapper
   build_code_index.py          # Build code index from repository
   check_github_urls.py         # Validate GitHub URLs
   check_status.py              # System status checker
   example_usage.py             # Example usage scripts
   ingest_docs.py               # Document ingestion utility
   main.py                      # Main entry point
   .env.example                 # Example environment variables
   docker-compose.yml           # Docker setup for Qdrant
   pyproject.toml               # Project dependencies

Installation

Prerequisites

  • Python 3.13 or higher
  • Google AI Studio API key (Get one here)

Setup

  1. Clone or navigate to the project directory

  2. Install dependencies

# Using pip
pip install -e .

# Or using uv (recommended)
uv pip install -e .
  1. Configure environment variables
# Copy the example env file
cp .env.example .env

# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here
  1. Start Qdrant (optional - using Docker)
docker-compose up -d

Usage

Running the FastAPI Server

Start the REST API server:

python -m rag_server.server

The server will start at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.

API Endpoints

Core Endpoints:

  • POST /documents - Upload a document
  • POST /query - Query the RAG system (standard mode)
  • POST /query-enhanced - Query with automatic reference following
  • POST /smart-query - Smart query with automatic routing
  • GET /documents - List all documents
  • DELETE /documents/{doc_id} - Delete a document
  • GET /stats - Get system statistics
  • GET /health - Health check
  • GET /tags - List all available tags
  • GET /documents/{doc_id}/sections - Get document structure

OpenAI-Compatible Endpoints:

  • POST /v1/chat/completions - OpenAI-compatible chat completions
  • GET /v1/models - List available models

Example Usage with curl

# Upload a document
curl -X POST "http://localhost:8000/documents" \
  -F "file=@example.txt"

# Upload with tags
curl -X POST "http://localhost:8000/documents" \
  -F "file=@dagster-docs.md" \
  -F "tags=dagster,python,orchestration"

# Query the RAG system
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What is the main topic of the documents?", "top_k": 5}'

# Smart query with automatic routing
curl -X POST "http://localhost:8000/smart-query" \
  -H "Content-Type: application/json" \
  -d '{"question": "How do I create a Dagster asset?"}'

# OpenAI-compatible chat completion
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rag-smart",
    "messages": [{"role": "user", "content": "What is an asset in Dagster?"}],
    "stream": false
  }'

# List documents
curl "http://localhost:8000/documents"

# Get statistics
curl "http://localhost:8000/stats"

Running the MCP Server

The MCP server allows integration with Claude and other MCP-compatible clients.

python -m mcp_server.server

MCP Tools Available

  1. query_rag - Query the RAG system with a question
  2. query_rag_enhanced - Query with automatic reference following
  3. smart_query - Smart query with automatic routing and classification
  4. add_document - Add a document to the RAG system
  5. list_documents - List all stored documents
  6. delete_document - Delete a document by ID
  7. get_rag_stats - Get system statistics
  8. get_tags - List all available tags
  9. get_document_structure - Get document table of contents

Using with Claude Desktop

Add to your Claude Desktop configuration (claude_desktop_config.json):

{
  "mcpServers": {
    "rag": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/mcp-rag-docs",
        "run",
        "python",
        "-m",
        "mcp_server.server"
      ]
    }
  }
}

See QUICK_START.md for a quick setup guide.

Configuration

All configuration is managed through environment variables (defined in .env):

Variable Description Default
GOOGLE_API_KEY Google AI Studio API key (required)
CHUNK_SIZE Size of text chunks in characters 1000
CHUNK_OVERLAP Overlap between chunks 200
TOP_K_RESULTS Number of chunks to retrieve 5
QDRANT_PATH Path to Qdrant storage ./qdrant_storage
QDRANT_COLLECTION_NAME Qdrant collection name documents
FASTAPI_HOST FastAPI server host 0.0.0.0
FASTAPI_PORT FastAPI server port 8000
EMBEDDING_MODEL Google embedding model text-embedding-004
LLM_MODEL Google LLM model gemini-1.5-flash

Architecture

Document Processing Pipeline

  1. Upload - User uploads a .txt or .md file
  2. Processing - Document is read and metadata extracted (including frontmatter)
  3. Chunking - Text is split using hierarchical chunking for markdown or standard chunking for text
  4. Embedding - Each chunk is converted to a vector using Google AI embeddings
  5. Storage - Vectors and metadata are stored in Qdrant

Query Pipeline

Standard Query

  1. Query - User submits a question
  2. Embedding - Question is converted to a vector
  3. Retrieval - Similar chunks are retrieved from Qdrant
  4. Generation - Context is provided to Google AI Studio model
  5. Response - Answer is generated and returned with sources

Smart Query

  1. Classification - Query is classified (documentation, code, conceptual, etc.)
  2. Routing - Automatically selects best retrieval strategy
  3. Multi-Source - May combine documentation search, code search, and direct answers
  4. Synthesis - Generates comprehensive answer from multiple sources

Code Indexing

The system can index source code repositories:

# Build code index
python build_code_index.py /path/to/repo

# Query code through the API or MCP server

Code is indexed with:

  • Class and function definitions
  • Docstrings and comments
  • File structure and imports
  • Semantic embeddings for natural language queries

Development

Running Tests

# Install test dependencies
pip install pytest pytest-asyncio httpx

# Run tests
pytest

# Run specific test files
pytest test_openai_api.py
pytest test_mcp_integration.py

Code Style

The project follows Python best practices with type hints and docstrings.

Troubleshooting

Common Issues

Issue: GOOGLE_API_KEY not found

  • Solution: Ensure you've created a .env file and added your Google API key

Issue: Unsupported file type

  • Solution: Only .txt and .md files are supported. Convert other formats first.

Issue: Collection already exists error

  • Solution: Delete the qdrant_storage/ directory to reset the database

Issue: MCP server not connecting

  • Solution: Check that the path in your MCP config is correct and the .env file is in the project root

Advanced Usage

Tag-Based Organization

Organize your documents with tags for easy categorization and filtering:

# Upload document with tags
curl -X POST "http://localhost:8000/documents" \
  -F "file=@dagster-docs.md" \
  -F "tags=dagster,python,orchestration"

# List all available tags
curl "http://localhost:8000/tags"

# Query only dagster-related documents
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}'

# List documents filtered by tags
curl "http://localhost:8000/documents?tags=dagster,python"

Hierarchical Document Structure

For markdown documents, the system automatically preserves heading hierarchy:

# Get document structure (table of contents)
curl "http://localhost:8000/documents/{doc_id}/sections"

# Query specific section
curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}'

Section-Aware Queries

The system includes section context when generating answers:

# Example: Markdown document structure
# Installation
#   Prerequisites
#     Python Version
#   Setup Steps

# When you query about "Python version requirements"
# The system will:
# 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version"
# 2. Include section path in context sent to LLM
# 3. Cite sources with full section paths

Smart Query Modes

The system supports three query modes:

  1. Standard (/query) - Basic vector search and retrieval
  2. Enhanced (/query-enhanced) - Follows documentation references automatically
  3. Smart (/smart-query) - Automatic classification and routing

Use the OpenAI-compatible API to access different modes:

# Standard mode
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-standard", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

# Enhanced mode with reference following
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-enhanced", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

# Smart mode with automatic routing
curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{"model": "rag-smart", "messages": [{"role": "user", "content": "What is Dagster?"}]}'

MCP Tools

The MCP server provides enhanced tools for Claude and other MCP clients:

query_rag - Query with optional tags and section filtering

{
  "question": "How do I deploy?",
  "tags": ["dagster"],
  "section_path": "Deployment"
}

smart_query - Smart query with automatic routing

{
  "question": "What is an asset and how do I use it?"
}

add_document - Upload with tags

{
  "file_path": "/path/to/doc.md",
  "tags": ["dagster", "docs"]
}

get_tags - List all tags

get_document_structure - Get table of contents

{
  "doc_id": "abc123"
}

API Reference

Enhanced Endpoints

POST /documents

  • Body: file (multipart), tags (comma-separated string)
  • Response: Document info with tags and chunk count

POST /query

  • Body: {"question": "...", "tags": [...], "section_path": "..."}
  • Response: Answer with section-aware sources

POST /smart-query

  • Body: {"question": "..."}
  • Response: Smart answer with automatic routing and classification

GET /tags

  • Response: {"tags": [...], "total": N}

GET /documents/{doc_id}/sections

  • Response: Document structure with section hierarchy

GET /documents?tags=tag1,tag2

  • Query filtered by tags
  • Response: List of matching documents

POST /v1/chat/completions

  • OpenAI-compatible chat completion endpoint
  • Supports models: rag-standard, rag-enhanced, rag-smart
  • Supports streaming with stream: true

GET /v1/models

  • List available RAG models

Additional Documentation

License

MIT License

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Acknowledgments

  • Google AI Studio for embeddings and LLM capabilities
  • Qdrant for vector database
  • FastAPI for the REST API framework
  • Anthropic MCP for the Model Context Protocol

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured