RAG MCP Server
Provides tools for ingesting documents into a local vector database and retrieving relevant information via semantic search, enabling retrieval-augmented generation for MCP clients.
README
RAG MCP Server
A Retrieval Augmented Generation (RAG) MCP server built with FastMCP <mcreference link="https://github.com/jlowin/fastmcp" index="1">1</mcreference> and ChromaDB <mcreference link="https://docs.trychroma.com/docs/overview/getting-started" index="2">2</mcreference> that provides MCP (Model Context Protocol) tools for ingesting documents into a local vector database and retrieving relevant information based on queries.
Features
š§ Tools
query_documents: Search for relevant documents using semantic similaritylist_ingested_files: View all files currently stored in the databasereingest_data_directory: Reingest all files from the data directory (useful to reindex contents when new files are added)get_rag_status: Get comprehensive system information including server status, database configuration, data directory status, and environment variables
š Resources
- None currently available
š¬ Prompts
rag_analysis_prompt: Generate structured prompts for analyzing documents on specific topics
Quick Start
1. Installation
# Install dependencies
pip install -r requirements.txt
# Or install manually
pip install fastmcp chromadb sentence-transformers
2. Run the Server
# Start the MCP server
python rag_server.py
# Or use FastMCP CLI for development with inspector
fastmcp dev rag_server.py
3. Test the Server
# Run the test suite
python test_rag_server.py
Directory Configuration
The server supports flexible configuration for both data and database directories through environment variables:
Data Directory Configuration:
Priority Order:
LLAMA_RAG_DATA_DIRenvironment variable (highest priority)./datain current working directory (workspace-relative)- Error: If neither is found, the server will log an error and skip auto-ingestion
Important: Unlike the database directory, the data directory requires explicit configuration. If no data directory is found, the server will:
- Log a clear error message with setup instructions
- Skip auto-ingestion (server will still start successfully)
- Require manual configuration before documents can be ingested
Database Directory Configuration:
Priority Order:
LLAMA_RAG_DB_DIRenvironment variable (highest priority)~/.local/share/rag-server(XDG Base Directory standard)./chromarelative to current working directory (fallback)
Usage Examples:
# Using environment variable (recommended)
export LLAMA_RAG_DATA_DIR=/path/to/your/documents
python rag_server.py
# Using current directory data folder
mkdir data
cp your_documents/* data/
python rag_server.py
# Error case - no configuration
# Server starts but logs: "No data directory found. Please either..."
python rag_server.py
# Use custom database directory only
LLAMA_RAG_DB_DIR=/path/to/your/database python rag_server.py
# Use both custom directories
LLAMA_RAG_DATA_DIR=~/Documents/rag-data LLAMA_RAG_DB_DIR=~/Documents/rag-db python rag_server.py
Testing:
# Test with temporary directories
LLAMA_RAG_DATA_DIR=/tmp/test_data LLAMA_RAG_DB_DIR=/tmp/test_db python rag_server.py
For detailed configuration options, see DATA_DIRECTORY_CONFIG.md.
Usage Examples
Ingesting Documents
# The server will chunk your document automatically
result = ingest_file(
file_path="sample_document.txt",
chunk_size=1000, # Characters per chunk
overlap=200 # Overlap between chunks
)
Querying Documents
# Search for relevant information
results = query_documents(
query="What is machine learning?",
n_results=5,
include_metadata=True
)
Checking System Status
# Get current system information
status = get_rag_status()
# Returns: {"status": "active", "total_documents": 42, ...}
Architecture
Components
- FastMCP Server: High-level MCP server framework <mcreference link="https://github.com/jlowin/fastmcp" index="1">1</mcreference>
- ChromaDB: Local vector database for document storage <mcreference link="https://docs.trychroma.com/docs/overview/getting-started" index="2">2</mcreference>
- Sentence Transformers: Embedding model for semantic search
Data Flow
Text File ā Chunking ā Embeddings ā ChromaDB ā Query ā Relevant Chunks
File Structure
mcp-rag/
āāā rag_server.py # Main MCP server implementation
āāā requirements.txt # Python dependencies
āāā test_rag_server.py # Test suite
āāā sample_document.txt # Example document for testing
āāā README.md # This file
āāā chroma_db/ # ChromaDB persistent storage (created automatically)
Configuration
Environment Variables
The server uses sensible defaults, but you can customize:
- Database Location: Modify
persist_directoryinrag_server.py - Collection Name: Change
rag_documentsto your preferred name - Chunk Settings: Adjust default
chunk_sizeandoverlapparameters
ChromaDB Settings
# Persistent storage configuration
chroma_client = chromadb.PersistentClient(
path="./chroma_db",
settings=Settings(
anonymized_telemetry=False,
allow_reset=True
)
)
Integration with MCP Clients
Claude Desktop
Add to your claude_desktop_config.json:
{
"mcpServers": {
"rag-server": {
"command": "python",
"args": ["/path/to/your/rag_server.py"],
"cwd": "/path/to/your/mcp-rag"
}
}
}
Cursor IDE
Add to your MCP configuration:
{
"mcpServers": {
"rag-server": {
"command": "python",
"args": ["rag_server.py"],
"cwd": "/path/to/mcp-rag"
}
}
}
Development
Testing with MCP Inspector
FastMCP includes a built-in web interface for testing:
# Install with CLI tools
pip install "fastmcp[cli]"
# Run with inspector
fastmcp dev rag_server.py
# Open browser to http://127.0.0.1:6274
Adding New Tools
@mcp.tool
def your_new_tool(param: str) -> str:
"""
Description of your tool.
Args:
param: Description of parameter
Returns:
Description of return value
"""
# Your implementation here
return "result"
Adding Resources
@mcp.resource("your://resource-uri")
def your_resource() -> dict:
"""
Description of your resource.
"""
return {"data": "value"}
Troubleshooting
Common Issues
-
Import Errors
pip install --upgrade fastmcp chromadb -
ChromaDB Permission Issues
# Ensure write permissions for chroma_db directory chmod -R 755 ./chroma_db -
Memory Issues with Large Files
- Reduce
chunk_sizeparameter - Process files in smaller batches
- Monitor system memory usage
- Reduce
-
Slow Query Performance
- Reduce
n_resultsparameter - Consider using more specific queries
- Check ChromaDB index status
- Reduce
Logging
The server includes comprehensive logging:
import logging
logging.basicConfig(level=logging.DEBUG) # Enable debug logging
Performance Considerations
Optimization Tips
- Chunk Size: Balance between context and performance (500-2000 characters)
- Overlap: Prevent context loss at chunk boundaries (10-20% of chunk size)
- Query Results: Limit
n_resultsto avoid overwhelming responses (3-10 results) - File Size: Consider splitting very large files before ingestion
Scaling
For production use:
- Consider ChromaDB's client-server mode
- Implement batch processing for large document sets
- Add caching for frequently accessed documents
- Monitor disk space for the vector database
Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
License
This project is open source. Feel free to use, modify, and distribute according to your needs.
References
- Model Context Protocol Documentation <mcreference link="https://modelcontextprotocol.io/llms-full.txt" index="0">0</mcreference>
- FastMCP Framework <mcreference link="https://github.com/jlowin/fastmcp" index="1">1</mcreference>
- ChromaDB Documentation <mcreference link="https://docs.trychroma.com/docs/overview/getting-started" index="2">2</mcreference>
Built with ā¤ļø using FastMCP and ChromaDB
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.