RAG MCP Server

RAG MCP Server

Provides tools for ingesting documents into a local vector database and retrieving relevant information via semantic search, enabling retrieval-augmented generation for MCP clients.

Category
Visit Server

README

RAG MCP Server

A Retrieval Augmented Generation (RAG) MCP server built with FastMCP <mcreference link="https://github.com/jlowin/fastmcp" index="1">1</mcreference> and ChromaDB <mcreference link="https://docs.trychroma.com/docs/overview/getting-started" index="2">2</mcreference> that provides MCP (Model Context Protocol) tools for ingesting documents into a local vector database and retrieving relevant information based on queries.

Features

šŸ”§ Tools

  • query_documents: Search for relevant documents using semantic similarity
  • list_ingested_files: View all files currently stored in the database
  • reingest_data_directory: Reingest all files from the data directory (useful to reindex contents when new files are added)
  • get_rag_status: Get comprehensive system information including server status, database configuration, data directory status, and environment variables

šŸ“Š Resources

  • None currently available

šŸ’¬ Prompts

  • rag_analysis_prompt: Generate structured prompts for analyzing documents on specific topics

Quick Start

1. Installation

# Install dependencies
pip install -r requirements.txt

# Or install manually
pip install fastmcp chromadb sentence-transformers

2. Run the Server

# Start the MCP server
python rag_server.py

# Or use FastMCP CLI for development with inspector
fastmcp dev rag_server.py

3. Test the Server

# Run the test suite
python test_rag_server.py

Directory Configuration

The server supports flexible configuration for both data and database directories through environment variables:

Data Directory Configuration:

Priority Order:

  1. LLAMA_RAG_DATA_DIR environment variable (highest priority)
  2. ./data in current working directory (workspace-relative)
  3. Error: If neither is found, the server will log an error and skip auto-ingestion

Important: Unlike the database directory, the data directory requires explicit configuration. If no data directory is found, the server will:

  • Log a clear error message with setup instructions
  • Skip auto-ingestion (server will still start successfully)
  • Require manual configuration before documents can be ingested

Database Directory Configuration:

Priority Order:

  1. LLAMA_RAG_DB_DIR environment variable (highest priority)
  2. ~/.local/share/rag-server (XDG Base Directory standard)
  3. ./chroma relative to current working directory (fallback)

Usage Examples:

# Using environment variable (recommended)
export LLAMA_RAG_DATA_DIR=/path/to/your/documents
python rag_server.py

# Using current directory data folder
mkdir data
cp your_documents/* data/
python rag_server.py

# Error case - no configuration
# Server starts but logs: "No data directory found. Please either..."
python rag_server.py

# Use custom database directory only
LLAMA_RAG_DB_DIR=/path/to/your/database python rag_server.py

# Use both custom directories
LLAMA_RAG_DATA_DIR=~/Documents/rag-data LLAMA_RAG_DB_DIR=~/Documents/rag-db python rag_server.py

Testing:

# Test with temporary directories
LLAMA_RAG_DATA_DIR=/tmp/test_data LLAMA_RAG_DB_DIR=/tmp/test_db python rag_server.py

For detailed configuration options, see DATA_DIRECTORY_CONFIG.md.

Usage Examples

Ingesting Documents

# The server will chunk your document automatically
result = ingest_file(
    file_path="sample_document.txt",
    chunk_size=1000,  # Characters per chunk
    overlap=200       # Overlap between chunks
)

Querying Documents

# Search for relevant information
results = query_documents(
    query="What is machine learning?",
    n_results=5,
    include_metadata=True
)

Checking System Status

# Get current system information
status = get_rag_status()
# Returns: {"status": "active", "total_documents": 42, ...}

Architecture

Components

  1. FastMCP Server: High-level MCP server framework <mcreference link="https://github.com/jlowin/fastmcp" index="1">1</mcreference>
  2. ChromaDB: Local vector database for document storage <mcreference link="https://docs.trychroma.com/docs/overview/getting-started" index="2">2</mcreference>
  3. Sentence Transformers: Embedding model for semantic search

Data Flow

Text File → Chunking → Embeddings → ChromaDB → Query → Relevant Chunks

File Structure

mcp-rag/
ā”œā”€ā”€ rag_server.py           # Main MCP server implementation
ā”œā”€ā”€ requirements.txt        # Python dependencies
ā”œā”€ā”€ test_rag_server.py     # Test suite
ā”œā”€ā”€ sample_document.txt    # Example document for testing
ā”œā”€ā”€ README.md              # This file
└── chroma_db/             # ChromaDB persistent storage (created automatically)

Configuration

Environment Variables

The server uses sensible defaults, but you can customize:

  • Database Location: Modify persist_directory in rag_server.py
  • Collection Name: Change rag_documents to your preferred name
  • Chunk Settings: Adjust default chunk_size and overlap parameters

ChromaDB Settings

# Persistent storage configuration
chroma_client = chromadb.PersistentClient(
    path="./chroma_db",
    settings=Settings(
        anonymized_telemetry=False,
        allow_reset=True
    )
)

Integration with MCP Clients

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "rag-server": {
      "command": "python",
      "args": ["/path/to/your/rag_server.py"],
      "cwd": "/path/to/your/mcp-rag"
    }
  }
}

Cursor IDE

Add to your MCP configuration:

{
  "mcpServers": {
    "rag-server": {
      "command": "python",
      "args": ["rag_server.py"],
      "cwd": "/path/to/mcp-rag"
    }
  }
}

Development

Testing with MCP Inspector

FastMCP includes a built-in web interface for testing:

# Install with CLI tools
pip install "fastmcp[cli]"

# Run with inspector
fastmcp dev rag_server.py

# Open browser to http://127.0.0.1:6274

Adding New Tools

@mcp.tool
def your_new_tool(param: str) -> str:
    """
    Description of your tool.
    
    Args:
        param: Description of parameter
    
    Returns:
        Description of return value
    """
    # Your implementation here
    return "result"

Adding Resources

@mcp.resource("your://resource-uri")
def your_resource() -> dict:
    """
    Description of your resource.
    """
    return {"data": "value"}

Troubleshooting

Common Issues

  1. Import Errors

    pip install --upgrade fastmcp chromadb
    
  2. ChromaDB Permission Issues

    # Ensure write permissions for chroma_db directory
    chmod -R 755 ./chroma_db
    
  3. Memory Issues with Large Files

    • Reduce chunk_size parameter
    • Process files in smaller batches
    • Monitor system memory usage
  4. Slow Query Performance

    • Reduce n_results parameter
    • Consider using more specific queries
    • Check ChromaDB index status

Logging

The server includes comprehensive logging:

import logging
logging.basicConfig(level=logging.DEBUG)  # Enable debug logging

Performance Considerations

Optimization Tips

  1. Chunk Size: Balance between context and performance (500-2000 characters)
  2. Overlap: Prevent context loss at chunk boundaries (10-20% of chunk size)
  3. Query Results: Limit n_results to avoid overwhelming responses (3-10 results)
  4. File Size: Consider splitting very large files before ingestion

Scaling

For production use:

  • Consider ChromaDB's client-server mode
  • Implement batch processing for large document sets
  • Add caching for frequently accessed documents
  • Monitor disk space for the vector database

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

This project is open source. Feel free to use, modify, and distribute according to your needs.

References


Built with ā¤ļø using FastMCP and ChromaDB

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured