MCP RAG System
A Retrieval-Augmented Generation system that enables uploading, processing, and semantic search of PDF documents using vector embeddings and FAISS indexing for context-aware question answering.
README
MCP RAG System
A comprehensive Retrieval-Augmented Generation (RAG) system built using the Model Context Protocol (MCP) for storing, processing, and searching PDF documents.
Features
š§ Tools
- upload_pdf: Upload and process PDF files with automatic text extraction and chunking
- search_documents: Semantic search across all uploaded documents using vector embeddings
- list_documents: View all uploaded documents and their metadata
- delete_document: Remove documents and their associated chunks from the system
- get_rag_stats: Get comprehensive statistics about the RAG system
š¦ Resources
- rag://documents: List all documents in the system
- rag://document/{document_id}: Get full content of a specific document
- rag://stats: Get system statistics
š¬ Prompts
- rag_query_prompt: Generate prompts for RAG-based question answering
- document_summary_prompt: Create document summarization prompts
- search_suggestions_prompt: Generate better search query suggestions
Installation
-
Install dependencies:
pip install -r requirements.txt -
Download required models: The system will automatically download the sentence-transformers model on first use.
Usage
Starting the Server
python mcp_server.py
The server will start on http://localhost:8000 with SSE (Server-Sent Events) transport.
Using the Client
Demo Mode
python mcp_client.py
# Choose option 1 for demo mode
Interactive Mode
python mcp_client.py
# Choose option 2 for interactive mode
Available commands in interactive mode:
upload- Upload a PDF filesearch- Search documents with a querylist- List all uploaded documentsstats- Show system statisticsquit- Exit the client
Example Workflow
-
Upload a PDF:
# Via tool call result = await session.call_tool("upload_pdf", arguments={ "file_path": "/path/to/document.pdf", "document_name": "My Research Paper" }) -
Search documents:
# Via tool call result = await session.call_tool("search_documents", arguments={ "query": "machine learning applications", "top_k": 5 }) -
Use RAG prompt:
# Get search results first, then use in prompt prompt = await session.get_prompt("rag_query_prompt", arguments={ "query": "What are the key findings?", "context_chunks": search_results_text })
System Architecture
Document Processing Pipeline
- PDF Upload ā Text extraction using PyMuPDF/PyPDF2
- Text Chunking ā Split into overlapping chunks (1000 chars, 200 overlap)
- Embedding Generation ā Create vector embeddings using SentenceTransformers
- Storage ā Store in FAISS index with metadata
Storage Structure
rag_storage/
āāā documents/ # Original extracted text
āāā chunks/ # Individual text chunks
āāā embeddings/ # Numpy arrays of embeddings
āāā faiss_index.bin # FAISS vector index
āāā metadata.json # Document and chunk metadata
Vector Search
- Model:
all-MiniLM-L6-v2(384-dimensional embeddings) - Index: FAISS IndexFlatIP (Inner Product similarity)
- Search: Cosine similarity for semantic matching
Configuration
Chunk Settings
Modify in mcp_server.py:
def _create_text_chunks(text: str, chunk_size: int = 1000, overlap: int = 200):
Embedding Model
Change the model in RAGSystem.__init__():
self.embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
Storage Location
Set custom storage directory:
rag_system = RAGSystem(storage_dir="custom_rag_storage")
API Reference
Tools
upload_pdf
- Parameters:
file_path(str),document_name(optional str) - Returns: Document ID, chunk count, success status
search_documents
- Parameters:
query(str),top_k(optional int, default 5) - Returns: Ranked list of relevant chunks with scores
list_documents
- Parameters: None
- Returns: List of all documents with metadata
delete_document
- Parameters:
document_id(str) - Returns: Success status and confirmation message
get_rag_stats
- Parameters: None
- Returns: System statistics (documents, chunks, storage size)
Resources
rag://documents
Returns formatted list of all documents in the system.
rag://document/{document_id}
Returns full text content of specified document with metadata header.
rag://stats
Returns formatted system statistics.
Prompts
rag_query_prompt
- Parameters:
query(str),context_chunks(str) - Returns: Structured prompt for RAG-based QA
document_summary_prompt
- Parameters:
document_content(str) - Returns: Prompt for document summarization
search_suggestions_prompt
- Parameters:
query(str),available_documents(str) - Returns: Prompt for generating better search queries
Performance Considerations
Memory Usage
- Embeddings: ~1.5KB per chunk (384 float32 values)
- FAISS index: Scales linearly with number of chunks
- Text storage: Depends on document size and chunking
Search Speed
- FAISS IndexFlatIP: O(n) search time
- For large collections, consider IndexIVFFlat or IndexHNSW
Optimization Tips
- Batch uploads for multiple documents
- Adjust chunk size based on document type
- Use GPU with
faiss-gpufor large datasets - Implement caching for frequent queries
Troubleshooting
Common Issues
-
PDF text extraction fails:
- Ensure PDF is not password-protected
- Try different PDF files to isolate the issue
- Check PyMuPDF and PyPDF2 installation
-
Memory errors with large documents:
- Reduce chunk size
- Process documents in batches
- Monitor system memory usage
-
Search returns no results:
- Verify documents are uploaded successfully
- Check query similarity to document content
- Try broader search terms
-
Server connection issues:
- Ensure server is running on correct port
- Check firewall settings
- Verify MCP client configuration
Debug Mode
Enable detailed logging by modifying the server:
import logging
logging.basicConfig(level=logging.DEBUG)
Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
License
This project is licensed under the MIT License. #ļæ½ ļæ½Mļæ½Cļæ½Pļæ½ ļæ½ ļæ½
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.