RAG Document Server
Enables semantic search and question-answering over uploaded documents using vector embeddings and Google AI. Supports document organization with tags, section-aware queries, and hierarchical markdown structure preservation.
README
RAG Server with MCP Integration
A Retrieval-Augmented Generation (RAG) system with Google AI Studio integration, featuring both a REST API and Model Context Protocol (MCP) server.
Features
Core Capabilities
- Document Storage: Upload and store text (.txt) and Markdown (.md) documents
- Hierarchical Chunking: Structure-aware chunking for markdown that preserves document hierarchy
- Vector Search: Efficient similarity search using Qdrant vector database
- Google AI Integration: Uses Google AI Studio for embeddings (text-embedding-004) and generation (gemini-1.5-flash)
- REST API: FastAPI-based REST API with automatic OpenAPI documentation
- MCP Server: Model Context Protocol server for seamless integration with Claude and other MCP clients
- OpenAI-Compatible API: Supports OpenAI-compatible chat completions for web UI integration
- Code Indexing: Index and search source code repositories with semantic understanding
- Smart Query Routing: Automatic query classification and routing to appropriate retrieval methods
Advanced Features
- Tag-Based Organization: Organize documents with multiple tags for easy categorization
- Section-Aware Retrieval: Query specific sections of documentation (e.g., "Installation > Prerequisites")
- Markdown Structure Preservation: Automatic extraction of heading hierarchy with breadcrumb paths
- Context-Enhanced Answers: LLM receives section context for more accurate responses
- Flexible Filtering: Filter documents by tags and/or section paths during queries
- Document Structure API: Explore table of contents and section organization
- GitHub Integration: Parse and extract content from GitHub URLs
- Reference Following: Automatically follow documentation references for comprehensive answers
- Multi-Mode Retrieval: Choose between standard, enhanced, or smart query modes
- Rate Limiting: Built-in rate limiting for API endpoints
Project Structure
mcp-rag-docs/
config/
__init__.py
settings.py # Configuration and settings
rag_server/
__init__.py
models.py # Pydantic models for API
openai_api.py # OpenAI-compatible API endpoints
openai_models.py # OpenAI API models
rag_system.py # Core RAG system logic
server.py # FastAPI server
smart_query.py # Smart query routing
mcp_server/
__init__.py
server.py # MCP server implementation
utils/
__init__.py
code_indexer.py # Source code indexing
code_index_store.py # Code index storage
document_processor.py # Document processing
embeddings.py # Google AI embeddings
frontmatter_parser.py # YAML frontmatter parsing
github_parser.py # GitHub URL parsing
google_api_client.py # Google AI API client
hierarchical_chunker.py # Hierarchical document chunking
markdown_parser.py # Markdown parsing
query_classifier.py # Query type classification
rate_limit_store.py # Rate limiting
reference_extractor.py # Extract doc references
retrieval_router.py # Multi-mode retrieval routing
source_extractor.py # Extract source code snippets
text_chunker.py # Text chunking utility
vector_store.py # Qdrant vector store wrapper
build_code_index.py # Build code index from repository
check_github_urls.py # Validate GitHub URLs
check_status.py # System status checker
example_usage.py # Example usage scripts
ingest_docs.py # Document ingestion utility
main.py # Main entry point
.env.example # Example environment variables
docker-compose.yml # Docker setup for Qdrant
pyproject.toml # Project dependencies
Installation
Prerequisites
- Python 3.13 or higher
- Google AI Studio API key (Get one here)
Setup
-
Clone or navigate to the project directory
-
Install dependencies
# Using pip
pip install -e .
# Or using uv (recommended)
uv pip install -e .
- Configure environment variables
# Copy the example env file
cp .env.example .env
# Edit .env and add your Google API key
GOOGLE_API_KEY=your_api_key_here
- Start Qdrant (optional - using Docker)
docker-compose up -d
Usage
Running the FastAPI Server
Start the REST API server:
python -m rag_server.server
The server will start at http://localhost:8000. Visit http://localhost:8000/docs for interactive API documentation.
API Endpoints
Core Endpoints:
- POST /documents - Upload a document
- POST /query - Query the RAG system (standard mode)
- POST /query-enhanced - Query with automatic reference following
- POST /smart-query - Smart query with automatic routing
- GET /documents - List all documents
- DELETE /documents/{doc_id} - Delete a document
- GET /stats - Get system statistics
- GET /health - Health check
- GET /tags - List all available tags
- GET /documents/{doc_id}/sections - Get document structure
OpenAI-Compatible Endpoints:
- POST /v1/chat/completions - OpenAI-compatible chat completions
- GET /v1/models - List available models
Example Usage with curl
# Upload a document
curl -X POST "http://localhost:8000/documents" \
-F "file=@example.txt"
# Upload with tags
curl -X POST "http://localhost:8000/documents" \
-F "file=@dagster-docs.md" \
-F "tags=dagster,python,orchestration"
# Query the RAG system
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "What is the main topic of the documents?", "top_k": 5}'
# Smart query with automatic routing
curl -X POST "http://localhost:8000/smart-query" \
-H "Content-Type: application/json" \
-d '{"question": "How do I create a Dagster asset?"}'
# OpenAI-compatible chat completion
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"model": "rag-smart",
"messages": [{"role": "user", "content": "What is an asset in Dagster?"}],
"stream": false
}'
# List documents
curl "http://localhost:8000/documents"
# Get statistics
curl "http://localhost:8000/stats"
Running the MCP Server
The MCP server allows integration with Claude and other MCP-compatible clients.
python -m mcp_server.server
MCP Tools Available
- query_rag - Query the RAG system with a question
- query_rag_enhanced - Query with automatic reference following
- smart_query - Smart query with automatic routing and classification
- add_document - Add a document to the RAG system
- list_documents - List all stored documents
- delete_document - Delete a document by ID
- get_rag_stats - Get system statistics
- get_tags - List all available tags
- get_document_structure - Get document table of contents
Using with Claude Desktop
Add to your Claude Desktop configuration (claude_desktop_config.json):
{
"mcpServers": {
"rag": {
"command": "uv",
"args": [
"--directory",
"/path/to/mcp-rag-docs",
"run",
"python",
"-m",
"mcp_server.server"
]
}
}
}
See QUICK_START.md for a quick setup guide.
Configuration
All configuration is managed through environment variables (defined in .env):
| Variable | Description | Default |
|---|---|---|
GOOGLE_API_KEY |
Google AI Studio API key | (required) |
CHUNK_SIZE |
Size of text chunks in characters | 1000 |
CHUNK_OVERLAP |
Overlap between chunks | 200 |
TOP_K_RESULTS |
Number of chunks to retrieve | 5 |
QDRANT_PATH |
Path to Qdrant storage | ./qdrant_storage |
QDRANT_COLLECTION_NAME |
Qdrant collection name | documents |
FASTAPI_HOST |
FastAPI server host | 0.0.0.0 |
FASTAPI_PORT |
FastAPI server port | 8000 |
EMBEDDING_MODEL |
Google embedding model | text-embedding-004 |
LLM_MODEL |
Google LLM model | gemini-1.5-flash |
Architecture
Document Processing Pipeline
- Upload - User uploads a .txt or .md file
- Processing - Document is read and metadata extracted (including frontmatter)
- Chunking - Text is split using hierarchical chunking for markdown or standard chunking for text
- Embedding - Each chunk is converted to a vector using Google AI embeddings
- Storage - Vectors and metadata are stored in Qdrant
Query Pipeline
Standard Query
- Query - User submits a question
- Embedding - Question is converted to a vector
- Retrieval - Similar chunks are retrieved from Qdrant
- Generation - Context is provided to Google AI Studio model
- Response - Answer is generated and returned with sources
Smart Query
- Classification - Query is classified (documentation, code, conceptual, etc.)
- Routing - Automatically selects best retrieval strategy
- Multi-Source - May combine documentation search, code search, and direct answers
- Synthesis - Generates comprehensive answer from multiple sources
Code Indexing
The system can index source code repositories:
# Build code index
python build_code_index.py /path/to/repo
# Query code through the API or MCP server
Code is indexed with:
- Class and function definitions
- Docstrings and comments
- File structure and imports
- Semantic embeddings for natural language queries
Development
Running Tests
# Install test dependencies
pip install pytest pytest-asyncio httpx
# Run tests
pytest
# Run specific test files
pytest test_openai_api.py
pytest test_mcp_integration.py
Code Style
The project follows Python best practices with type hints and docstrings.
Troubleshooting
Common Issues
Issue: GOOGLE_API_KEY not found
- Solution: Ensure you've created a
.envfile and added your Google API key
Issue: Unsupported file type
- Solution: Only .txt and .md files are supported. Convert other formats first.
Issue: Collection already exists error
- Solution: Delete the
qdrant_storage/directory to reset the database
Issue: MCP server not connecting
- Solution: Check that the path in your MCP config is correct and the
.envfile is in the project root
Advanced Usage
Tag-Based Organization
Organize your documents with tags for easy categorization and filtering:
# Upload document with tags
curl -X POST "http://localhost:8000/documents" \
-F "file=@dagster-docs.md" \
-F "tags=dagster,python,orchestration"
# List all available tags
curl "http://localhost:8000/tags"
# Query only dagster-related documents
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "How do I create a pipeline?", "tags": ["dagster"]}'
# List documents filtered by tags
curl "http://localhost:8000/documents?tags=dagster,python"
Hierarchical Document Structure
For markdown documents, the system automatically preserves heading hierarchy:
# Get document structure (table of contents)
curl "http://localhost:8000/documents/{doc_id}/sections"
# Query specific section
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "What are the prerequisites?", "section_path": "Installation > Prerequisites"}'
Section-Aware Queries
The system includes section context when generating answers:
# Example: Markdown document structure
# Installation
# Prerequisites
# Python Version
# Setup Steps
# When you query about "Python version requirements"
# The system will:
# 1. Retrieve relevant chunks from "Installation > Prerequisites > Python Version"
# 2. Include section path in context sent to LLM
# 3. Cite sources with full section paths
Smart Query Modes
The system supports three query modes:
- Standard (
/query) - Basic vector search and retrieval - Enhanced (
/query-enhanced) - Follows documentation references automatically - Smart (
/smart-query) - Automatic classification and routing
Use the OpenAI-compatible API to access different modes:
# Standard mode
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "rag-standard", "messages": [{"role": "user", "content": "What is Dagster?"}]}'
# Enhanced mode with reference following
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "rag-enhanced", "messages": [{"role": "user", "content": "What is Dagster?"}]}'
# Smart mode with automatic routing
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{"model": "rag-smart", "messages": [{"role": "user", "content": "What is Dagster?"}]}'
MCP Tools
The MCP server provides enhanced tools for Claude and other MCP clients:
query_rag - Query with optional tags and section filtering
{
"question": "How do I deploy?",
"tags": ["dagster"],
"section_path": "Deployment"
}
smart_query - Smart query with automatic routing
{
"question": "What is an asset and how do I use it?"
}
add_document - Upload with tags
{
"file_path": "/path/to/doc.md",
"tags": ["dagster", "docs"]
}
get_tags - List all tags
get_document_structure - Get table of contents
{
"doc_id": "abc123"
}
API Reference
Enhanced Endpoints
POST /documents
- Body:
file(multipart),tags(comma-separated string) - Response: Document info with tags and chunk count
POST /query
- Body:
{"question": "...", "tags": [...], "section_path": "..."} - Response: Answer with section-aware sources
POST /smart-query
- Body:
{"question": "..."} - Response: Smart answer with automatic routing and classification
GET /tags
- Response:
{"tags": [...], "total": N}
GET /documents/{doc_id}/sections
- Response: Document structure with section hierarchy
GET /documents?tags=tag1,tag2
- Query filtered by tags
- Response: List of matching documents
POST /v1/chat/completions
- OpenAI-compatible chat completion endpoint
- Supports models:
rag-standard,rag-enhanced,rag-smart - Supports streaming with
stream: true
GET /v1/models
- List available RAG models
Additional Documentation
- QUICK_START.md - Quick setup guide for MCP integration
- MCP_SETUP.md - Detailed MCP server setup
- OPENAI_API_GUIDE.md - OpenAI-compatible API documentation
- QUERY_ROUTING_GUIDE.md - Smart query routing guide
- MULTI_MODE_RETRIEVAL_GUIDE.md - Multi-mode retrieval documentation
- CODE_INDEX_GUIDE.md - Code indexing and search guide
- RATE_LIMITING.md - Rate limiting configuration
- TEST_COVERAGE.md - Test coverage and testing guide
License
MIT License
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Acknowledgments
- Google AI Studio for embeddings and LLM capabilities
- Qdrant for vector database
- FastAPI for the REST API framework
- Anthropic MCP for the Model Context Protocol
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.