MCP Servers

RAG MCP Server

Provides tools for ingesting documents into a local vector database and retrieving relevant information via semantic search, enabling retrieval-augmented generation for MCP clients.

README

RAG MCP Server

A Retrieval Augmented Generation (RAG) MCP server built with FastMCP <mcreference link="https://github.com/jlowin/fastmcp" index="1">1</mcreference> and ChromaDB <mcreference link="https://docs.trychroma.com/docs/overview/getting-started" index="2">2</mcreference> that provides MCP (Model Context Protocol) tools for ingesting documents into a local vector database and retrieving relevant information based on queries.

Features

🔧 Tools

query_documents: Search for relevant documents using semantic similarity
list_ingested_files: View all files currently stored in the database
reingest_data_directory: Reingest all files from the data directory (useful to reindex contents when new files are added)
get_rag_status: Get comprehensive system information including server status, database configuration, data directory status, and environment variables

📊 Resources

None currently available

💬 Prompts

rag_analysis_prompt: Generate structured prompts for analyzing documents on specific topics

Quick Start

1. Installation

# Install dependencies
pip install -r requirements.txt

# Or install manually
pip install fastmcp chromadb sentence-transformers

2. Run the Server

# Start the MCP server
python rag_server.py

# Or use FastMCP CLI for development with inspector
fastmcp dev rag_server.py

3. Test the Server

# Run the test suite
python test_rag_server.py

Directory Configuration

The server supports flexible configuration for both data and database directories through environment variables:

Data Directory Configuration:

Priority Order:

LLAMA_RAG_DATA_DIR environment variable (highest priority)
./data in current working directory (workspace-relative)
Error: If neither is found, the server will log an error and skip auto-ingestion

Important: Unlike the database directory, the data directory requires explicit configuration. If no data directory is found, the server will:

Log a clear error message with setup instructions
Skip auto-ingestion (server will still start successfully)
Require manual configuration before documents can be ingested

Database Directory Configuration:

Priority Order:

LLAMA_RAG_DB_DIR environment variable (highest priority)
~/.local/share/rag-server (XDG Base Directory standard)
./chroma relative to current working directory (fallback)

Usage Examples:

# Using environment variable (recommended)
export LLAMA_RAG_DATA_DIR=/path/to/your/documents
python rag_server.py

# Using current directory data folder
mkdir data
cp your_documents/* data/
python rag_server.py

# Error case - no configuration
# Server starts but logs: "No data directory found. Please either..."
python rag_server.py

# Use custom database directory only
LLAMA_RAG_DB_DIR=/path/to/your/database python rag_server.py

# Use both custom directories
LLAMA_RAG_DATA_DIR=~/Documents/rag-data LLAMA_RAG_DB_DIR=~/Documents/rag-db python rag_server.py

Testing:

# Test with temporary directories
LLAMA_RAG_DATA_DIR=/tmp/test_data LLAMA_RAG_DB_DIR=/tmp/test_db python rag_server.py

For detailed configuration options, see DATA_DIRECTORY_CONFIG.md.

Usage Examples

Ingesting Documents

# The server will chunk your document automatically
result = ingest_file(
    file_path="sample_document.txt",
    chunk_size=1000,  # Characters per chunk
    overlap=200       # Overlap between chunks
)

Querying Documents

# Search for relevant information
results = query_documents(
    query="What is machine learning?",
    n_results=5,
    include_metadata=True
)

Checking System Status

# Get current system information
status = get_rag_status()
# Returns: {"status": "active", "total_documents": 42, ...}

Architecture

Components

FastMCP Server: High-level MCP server framework <mcreference link="https://github.com/jlowin/fastmcp" index="1">1</mcreference>
ChromaDB: Local vector database for document storage <mcreference link="https://docs.trychroma.com/docs/overview/getting-started" index="2">2</mcreference>
Sentence Transformers: Embedding model for semantic search

Data Flow

Text File → Chunking → Embeddings → ChromaDB → Query → Relevant Chunks

File Structure

mcp-rag/
├── rag_server.py           # Main MCP server implementation
├── requirements.txt        # Python dependencies
├── test_rag_server.py     # Test suite
├── sample_document.txt    # Example document for testing
├── README.md              # This file
└── chroma_db/             # ChromaDB persistent storage (created automatically)

Configuration

Environment Variables

The server uses sensible defaults, but you can customize:

Database Location: Modify persist_directory in rag_server.py
Collection Name: Change rag_documents to your preferred name
Chunk Settings: Adjust default chunk_size and overlap parameters

ChromaDB Settings

# Persistent storage configuration
chroma_client = chromadb.PersistentClient(
    path="./chroma_db",
    settings=Settings(
        anonymized_telemetry=False,
        allow_reset=True
    )
)

Integration with MCP Clients

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "rag-server": {
      "command": "python",
      "args": ["/path/to/your/rag_server.py"],
      "cwd": "/path/to/your/mcp-rag"
    }
  }
}

Cursor IDE

Add to your MCP configuration:

{
  "mcpServers": {
    "rag-server": {
      "command": "python",
      "args": ["rag_server.py"],
      "cwd": "/path/to/mcp-rag"
    }
  }
}

Development

Testing with MCP Inspector

FastMCP includes a built-in web interface for testing:

# Install with CLI tools
pip install "fastmcp[cli]"

# Run with inspector
fastmcp dev rag_server.py

# Open browser to http://127.0.0.1:6274

Adding New Tools

@mcp.tool
def your_new_tool(param: str) -> str:
    """
    Description of your tool.
    
    Args:
        param: Description of parameter
    
    Returns:
        Description of return value
    """
    # Your implementation here
    return "result"

Adding Resources

@mcp.resource("your://resource-uri")
def your_resource() -> dict:
    """
    Description of your resource.
    """
    return {"data": "value"}

Troubleshooting

Common Issues

Import Errors
```
pip install --upgrade fastmcp chromadb
```

ChromaDB Permission Issues

# Ensure write permissions for chroma_db directory
chmod -R 755 ./chroma_db

Memory Issues with Large Files
- Reduce chunk_size parameter
- Process files in smaller batches
- Monitor system memory usage
Slow Query Performance
- Reduce n_results parameter
- Consider using more specific queries
- Check ChromaDB index status

Logging

The server includes comprehensive logging:

import logging
logging.basicConfig(level=logging.DEBUG)  # Enable debug logging

Performance Considerations

Optimization Tips

Chunk Size: Balance between context and performance (500-2000 characters)
Overlap: Prevent context loss at chunk boundaries (10-20% of chunk size)
Query Results: Limit n_results to avoid overwhelming responses (3-10 results)
File Size: Consider splitting very large files before ingestion

Scaling

For production use:

Consider ChromaDB's client-server mode
Implement batch processing for large document sets
Add caching for frequently accessed documents
Monitor disk space for the vector database

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass
Submit a pull request

License

This project is open source. Feel free to use, modify, and distribute according to your needs.

References

Model Context Protocol Documentation <mcreference link="https://modelcontextprotocol.io/llms-full.txt" index="0">0</mcreference>
FastMCP Framework <mcreference link="https://github.com/jlowin/fastmcp" index="1">1</mcreference>
ChromaDB Documentation <mcreference link="https://docs.trychroma.com/docs/overview/getting-started" index="2">2</mcreference>

Built with ❤️ using FastMCP and ChromaDB

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured