Claude RAG MCP Pipeline
Enables Claude Desktop to search and query personal document collections (PDF, Word, Markdown, text) using semantic search and conversational AI with full context preservation across exchanges.
README
Claude RAG Pipeline with MCP Integration
Retrieval Augmented Generation (RAG) system featuring conversational memory, hybrid knowledge modes, and Model Context Protocol (MCP) integration for document search functionality within Claude Desktop.
Features
- Complete RAG Pipeline: Document processing, vector embeddings, semantic search, and LLM-powered responses
- Conversational Memory: Full ChatGPT-style conversations with context preservation across exchanges
- Hybrid Knowledge Mode: Ability to switch between document-based responses and general knowledge
- Semantic Chunking: Intelligent document segmentation that preserves meaning and context
- MCP Integration: Native Claude Desktop integration for document access functionality
- Multi-format Support: PDF, Word, Markdown, and plain text documents
- Vector Database: ChromaDB for efficient semantic search
- Web Interface: Streamlit app for document management and chat
Architecture
Documents → Semantic Processing → Vector Embeddings → ChromaDB → Retrieval → Claude API → Response
↓
MCP Protocol
↓
Claude Desktop
Tech Stack
- LLM: Claude 3.5 (Anthropic API)
- Embeddings: OpenAI text-embedding-ada-002 or local sentence-transformers
- Vector Database: ChromaDB
- Web Framework: Streamlit
- Document Processing: PyPDF2, python-docx
- Integration Protocol: Model Context Protocol (MCP)
Quick Start
Prerequisites
- Python 3.8+
- OpenAI API key
- Anthropic API key
- Claude Desktop app (for MCP integration)
Installation
- Clone the repository:
git clone https://github.com/yourusername/claude-rag-mcp-pipeline.git
cd claude-rag-mcp-pipeline
- Create virtual environment:
python3 -m venv rag_env
source rag_env/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Configure environment variables:
cp .env.example .env
# Edit .env with your API keys
Basic Usage
- Create documents folder:
mkdir documents
-
Add documents to the
documents/folder (PDF, Word, Markdown, or text files) -
Start the Streamlit app:
streamlit run app.py
-
Ingest documents using the sidebar interface
-
Chat with your documents using the conversational interface
MCP Integration (Advanced)
Connect your RAG system to Claude Desktop for native document access:
- Configure Claude Desktop (create config file if it doesn't exist):
# Create the config file if it doesn't exist
mkdir -p "~/Library/Application Support/Claude"
touch "~/Library/Application Support/Claude/claude_desktop_config.json"
Edit the configuration file:
// ~/.../Claude/claude_desktop_config.json
{
"mcpServers": {
"personal-documents": {
"command": "/path/to/your/project/rag_env/bin/python",
"args": ["/path/to/your/project/mcp_server.py"],
"env": {
"OPENAI_API_KEY": "your_key",
"ANTHROPIC_API_KEY": "your_key"
}
}
}
}
- Start the MCP server:
python mcp_server.py
- Use Claude Desktop - Chats will access your documents when relevant (e.g. try prompting "Can you search my documents for details regarding ...?"). Ensure "personal-documents" is enabled under "Search and tools".
Project Structure
claude-rag-mcp-pipeline/
├── src/
│ ├── document_processor.py # Document processing and semantic chunking
│ ├── embeddings.py # Embedding generation (OpenAI/local)
│ ├── vector_store.py # ChromaDB interface
│ ├── llm_service.py # Claude API integration
│ └── rag_system.py # Main RAG orchestration
├── documents/ # Your documents go here
├── chroma_db/ # Vector database (auto-created)
├── app.py # Streamlit web interface
├── mcp_server.py # MCP protocol server
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
└── README.md
Key Components
Document Processing
- Multi-format support: Handles PDF, Word, Markdown, and text files
- Semantic chunking: Preserves document structure and meaning
- Metadata preservation: Tracks source files and chunk locations
Vector Search
- Semantic similarity: Finds relevant content by meaning, not keywords
- Configurable retrieval: Adjustable number of context chunks
- Source attribution: Clear tracking of information sources
Conversational Interface
- Memory persistence: Maintains conversation context across exchanges
- Hybrid responses: Combines document knowledge with general AI capabilities
- Source citation: References specific documents in responses
MCP Integration
- Native Claude access: Use Claude Desktop with your document knowledge
- Automatic tool detection: Claude recognizes when to search your documents
- Secure local processing: Documents never leave your machine
Configuration
Embedding Options
Switch between OpenAI embeddings and local models:
# In src/rag_system.py
rag = ConversationalRAGSystem(embedding_provider="openai") # or "local"
Chunking Parameters
Adjust semantic chunking behavior:
# In src/document_processor.py
chunks = self.semantic_chunk_llm(text, max_chunk_size=800, min_chunk_size=100)
Response Tuning
Modify retrieval and response generation:
# Number of chunks to retrieve
result = rag.query(question, n_results=5)
# Claude response length
response = llm_service.generate_response(query, chunks, max_tokens=600)
Claude Model Selection
Change the Claude model version in src/llm_service.py:
# In LLMService class methods, update the model parameter (check console.anthropic.com for currently available models):
model="claude-3-5-haiku-latest" # Fast, cost-effective
model="claude-3-5-sonnet-latest" # Higher quality reasoning
model="claude-3-opus-latest" # Most capable
model="claude-4-sonnet-latest" # If available
Use Cases
- Personal Knowledge Base: Make your documents searchable and conversational
- Research Assistant: Query across multiple documents simultaneously
- Document Analysis: Extract insights from large document collections
- Enterprise RAG: Foundation for company-wide knowledge systems
Technical Details
Transformer Architecture Understanding
This system demonstrates practical implementation of:
- Vector embeddings for semantic representation
- Attention mechanisms through retrieval scoring
- Multi-step reasoning through conversation context
- Hybrid AI architectures combining retrieval and generation
API Costs
Typical monthly usage:
- OpenAI embeddings: $2-10
- Claude API calls: $5-25
- Total: $7-35/month for moderate usage
Cost reduction:
- Use local embeddings (sentence-transformers) for free embedding generation
- Adjust response length limits
- Optimize chunk retrieval counts
Production Considerations
Current Implementation Scope
This system is designed for personal/single-user environments. However, the core RAG functionality, MCP integration, and conversational AI systems can be implemented at enterprise-level.
Enterprise Production Deployment Requirements
To deploy this system in a true production enterprise environment, the following additions would be needed:
Authentication & Authorization:
- Multi-user authentication system (SSO integration)
- Role-based access controls (RBAC)
- Document-level permissions and access policies
- API key rotation and secure credential management
Infrastructure & Scalability:
- Container orchestration (Kubernetes deployment)
- Production-grade vector database (Pinecone, Weaviate, or managed ChromaDB)
- Load balancing and horizontal scaling
- Database clustering and replication
- CDN integration for document serving
Monitoring & Operations:
- Application performance monitoring (APM)
- Logging aggregation and analysis
- Health checks and alerting systems
- Usage analytics and cost tracking
- Backup and disaster recovery procedures
Security Hardening:
- Input validation and sanitization
- Rate limiting and DDoS protection
- Network security (VPC, firewalls, encryption in transit)
- Data encryption at rest
- Security audit trails and compliance logging
Enterprise Integration:
- Integration with existing identity providers
- Corporate data governance policies
- Compliance with data retention requirements
- Integration with enterprise monitoring/alerting systems
- Multi-tenancy support with resource isolation
Cost Management:
- Usage-based billing and chargeback systems
- Cost optimization and budget controls
- API usage monitoring and alerts
- Resource utilization optimization
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Submit a pull request
License
MIT License - see LICENSE file for details.
Acknowledgments
Built with:
- Anthropic Claude for LLM capabilities
- OpenAI for embedding models
- ChromaDB for vector storage
- Streamlit for web interface
- Model Context Protocol for Claude Desktop integration
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.