qdrant-mcp-hybrid
Enables multi-tenant RAG with Qdrant, LM Studio embeddings and summaries, and advanced document processing for semantic search across isolated client collections.
README
š Qdrant MCP Hybrid - Ultimate RAG System
The most advanced TypeScript MCP server for Qdrant with multi-client isolation, LM Studio integration, and enterprise-grade document processing
š What is This?
This is the ultimate evolution of RAG (Retrieval-Augmented Generation) systems, combining the best practices from:
- lance-mcp architecture & document processing
- sqlite-vss-mcp performance optimizations & concurrency
- delorenj/mcp-qdrant-memory TypeScript foundation & MCP integration
Result: A production-ready, multi-tenant RAG system with client isolation, advanced seeding, and LM Studio integration.
ā” Key Features
š¢ Multi-Client Architecture
- Complete isolation between clients - perfect for agencies, consultants, or organizations managing multiple projects
- Separate collections for each client:
{client}_catalog+{client}_chunks - Privacy-first design for sensitive documents
š§ LM Studio Integration
- BGE-M3 embeddings (1024 dimensions) for semantic search
- Qwen3-8B summaries for document overviews
- Zero cloud dependency - everything runs locally for maximum privacy
š Advanced Document Processing
- SHA256 deduplication - never process the same document twice (90%+ time savings on updates)
- Multi-format support - PDF, Markdown, TXT, DOCX
- Incremental updates - only process changed files
- Batch processing - efficient API usage with p-limit concurrency control
š Enterprise Search
- Semantic catalog search - find documents by meaning, not just keywords
- Granular chunk search - search within specific documents
- Cross-client search - find information across all clients
- Rich metadata - source tracking, chunk indexing, similarity scores
š Quick Install via NPM
Global Installation (Recommended)
# Install globally for easy project setup
npm install -g claude-qdrant-mcp
# Create new project
mkdir my-rag-project
cd my-rag-project
qdrant-setup
# Or use the interactive setup
npm run setup
Local Project Installation
# Install in existing project
npm install claude-qdrant-mcp
# Run interactive setup
npx qdrant-setup
What the Auto-Setup Does
ā
Dependency Check - Verifies Node.js, Qdrant, and LM Studio
ā
Environment Config - Interactive .env file creation
ā
Claude Desktop Integration - Automatic MCP server configuration
ā
Sample Documents - Creates test files for immediate use
ā
Connection Testing - Validates all services are working
One-Command Install & Test
# Complete setup and test in one go
npm install -g claude-qdrant-mcp && \
mkdir my-rag && cd my-rag && \
qdrant-setup && \
npm run test-connection
Available Commands
After installation, you have access to:
# Interactive setup wizard
qdrant-setup
# Test all connections
npm run test-connection
# Seed documents
npm run seed -- --client work --filesdir ./documents
# Start MCP server
npm start
# Development mode
npm run watch
ļæ½ Table of Contents
- š What is This?
- ā” Key Features
- š Quick Install via NPM
- š ļø Manual Installation & Setup
- š LM Studio Setup
- š Usage Examples
- šļø Architecture Deep Dive
- šÆ Performance & Scalability
- š Troubleshooting
- š Development
- š Migration from Other Systems
- š Privacy & Security
- š£ļø Roadmap
- š Documentation
- š¤ Contributing
- š License
- š Acknowledgments
- š Support
š ļø Manual Installation & Setup
Prerequisites
- Node.js 18+
- LM Studio running locally with BGE-M3 + Qwen3 models
- Qdrant server (local Docker or Qdrant Cloud)
Quick Start
# Clone the repository
git clone https://github.com/marlian/claude-qdrant-mcp.git
cd claude-qdrant-mcp
# Install dependencies
npm install
# Setup environment
cp .env.example .env
# Edit .env with your configuration
# Build the project
npm run build
# Test with help
npm run seed -- --help
Environment Configuration
Create a .env file with your settings:
# Qdrant Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=your-api-key-if-using-cloud
# LM Studio Configuration
LM_STUDIO_URL=http://127.0.0.1:1235
EMBEDDING_MODEL=text-embedding-finetuned-bge-m3
EMBEDDING_DIM=1024
LLM_MODEL=qwen/qwen3-8b
# Multi-Client Setup (customize with your client names)
CLIENT_COLLECTIONS=client_a,client_b,personal,work,research
# Performance Tuning
CONCURRENCY=5
BATCH_SIZE=10
CHUNK_SIZE=500
CHUNK_OVERLAP=10
DEBUG=false
š LM Studio Setup
Required Models
-
BGE-M3 Embedding Model
- Download from LM Studio model library
- Model name:
text-embedding-finetuned-bge-m3 - Purpose: Generate 1024-dim embeddings for semantic search
-
Qwen3-8B Chat Model
- Download from LM Studio model library
- Model name:
qwen/qwen3-8b - Purpose: Generate document summaries
LM Studio Configuration
- Start LM Studio
- Load both models
- Start the server (default port 1235)
- Verify connection:
curl http://127.0.0.1:1235/v1/models
š Usage Examples
Document Seeding
# Seed documents for specific client
npm run seed -- --client work --filesdir /path/to/work/documents
# Force overwrite existing data (full reprocessing)
npm run seed -- --client personal --filesdir /path/to/personal/docs --overwrite
# Validate documents without seeding
npm run seed -- --client research --filesdir /path/to/research/docs --validate-only
# Debug mode for troubleshooting
npm run seed -- --client client_a --filesdir /path/to/docs --debug
MCP Server Usage
# Run the MCP server
npm start
# Or in development mode with watch
npm run watch
Claude Desktop Integration
Add to your claude_desktop_config.json:
{
"mcpServers": {
"qdrant-rag": {
"command": "node",
"args": ["/absolute/path/to/claude-qdrant-mcp/dist/index.js"],
"env": {
"QDRANT_URL": "http://localhost:6333",
"QDRANT_API_KEY": "your-api-key-if-needed",
"CLIENT_COLLECTIONS": "work,personal,research"
}
}
}
}
š§ Available MCP Tools
collection_info
Get status of all collections and clients.
// No parameters needed
collection_info()
// Returns: Collection stats, client list, system status
catalog_search
Search document summaries for a specific client.
{
"query": "quarterly business strategy",
"client": "work",
"limit": 10
}
chunks_search
Search document chunks with optional source filtering.
{
"query": "machine learning implementation",
"client": "research",
"source": "/path/to/specific/document.md", // optional
"limit": 5
}
all_chunks_search
Search across all clients and collections.
{
"query": "project management best practices",
"limit": 20
}
šļø Architecture Deep Dive
Collection Structure
Qdrant Collections:
āāā work_catalog # Document summaries for work
āāā work_chunks # Document chunks for work
āāā personal_catalog # Document summaries for personal
āāā personal_chunks # Document chunks for personal
āāā research_catalog # Document summaries for research
āāā research_chunks # Document chunks for research
āāā ... (per client)
Data Flow Pipeline
Documents ā Hash Check ā Content Extract ā LM Summary ā
Chunk Split ā BGE-M3 Embed ā Batch Process ā Qdrant Store ā MCP Search
Document Processing Pipeline
- Directory Scan - Find all supported documents (.pdf, .md, .txt, .docx)
- Hash Validation - SHA256 deduplication (skip unchanged files)
- Content Processing - Extract text using appropriate parsers
- Summary Generation - LM Studio Qwen3 creates document overviews
- Chunk Creation - Split documents with configurable overlap
- Batch Embedding - BGE-M3 vectorization in efficient batches
- Qdrant Storage - Dual collection storage (catalog + chunks)
šÆ Performance & Scalability
Optimizations Applied
- Concurrency Control - p-limit prevents API overload
- Batch Processing - Multiple embeddings per API call
- Smart Caching - SHA256 prevents duplicate processing
- Memory Efficient - Streaming document processing
- Error Recovery - Graceful handling of failures
Performance Benchmarks
| Metric | Performance | Notes |
|---|---|---|
| Documents/minute | 50-100 | Depends on document size and LM Studio performance |
| Memory usage | 100-500MB | During processing, minimal at rest |
| Search latency | <200ms | Average semantic search response time |
| Concurrency | 5 parallel | Configurable based on system resources |
| Hash optimization | 90%+ savings | On incremental updates |
Scalability Features
- Multi-client isolation - No data leakage between clients
- Horizontal scaling - Add more Qdrant nodes as needed
- Local-first - No external API dependencies or costs
- Incremental processing - Only process changed documents
š Troubleshooting
Common Issues
ā "LM Studio connection failed"
# Check LM Studio is running
curl http://127.0.0.1:1235/v1/models
# Verify models are loaded
# BGE-M3 for embeddings, Qwen3 for summaries
ā "Qdrant connection failed"
# Check Qdrant server (local)
curl http://localhost:6333/collections
# Check Qdrant Cloud with API key
curl -H "api-key: YOUR_KEY" https://your-cluster.qdrant.io/collections
ā "No documents found"
# Check file path exists and contains supported formats
ls -la /path/to/documents
# Verify supported file types (.pdf, .md, .txt, .docx)
find /path/to/documents -name "*.md" -o -name "*.pdf" -o -name "*.txt" -o -name "*.docx"
Debug Mode
Enable comprehensive logging:
export DEBUG=true
npm run seed -- --client test --filesdir ./sample-docs --debug
š Development
Project Structure
src/
āāā config.ts # Enhanced configuration system
āāā types.ts # RAG document types & interfaces
āāā index.ts # MCP server & tool handlers
āāā seed.ts # Ultimate document processing engine
āāā persistence/
ā āāā qdrant.ts # Multi-collection Qdrant client
āāā validation.ts # Input validation & safety
Building & Testing
# Development build
npm run build
# Watch mode for development
npm run watch
# Test processing without modifying database
npm run seed -- --validate-only --client test --filesdir ./test-docs
Adding New Clients
- Update
CLIENT_COLLECTIONSin.env - Run seed command with new client name
- Collections are created automatically
š Migration from Other Systems
From lance-mcp
- Collections replace single database files
- Enhanced config replaces hardcoded settings
- Multi-client replaces single-tenant approach
- Cloud sync replaces local-only storage
From sqlite-vss-mcp
- Qdrant replaces SQLite + VSS for better performance
- TypeScript replaces Python implementation
- MCP integration replaces custom API
From original mcp-qdrant-memory
- RAG document model replaces knowledge graph entities
- LM Studio replaces OpenAI for cost-free local processing
- Multi-collection replaces single collection architecture
š Privacy & Security
- Local-first processing - Documents never leave your machine
- Client isolation - Complete data separation between clients
- No external APIs - LM Studio runs entirely offline
- Hash-based deduplication - Secure content fingerprinting
- Configurable storage - Use local Qdrant or secure cloud instances
š£ļø Roadmap
Planned Features
- Web UI for collection management and search
- Additional embedding models (support for other local models)
- Advanced chunking strategies (semantic splitting)
- Hybrid search (combine vector + keyword search)
- Export/import collections for backup and sharing
Integration Possibilities
- Obsidian plugin for direct vault integration
- API server mode for external applications
- Batch processing for large document sets
- Real-time file watching for automatic updates
š Extended Documentation
Looking for deeper details, integrations or low-level references?
Check out the full documentation under /docs:
- š§ Claude Project Instructions ā AI agent behavior and search workflows
- š„ļø Claude Desktop Integration ā Setup guide for local LM Studio
- āļø Advanced Configuration ā Power user setup and tuning
- š MCP Tools Reference ā Tool descriptions, parameters, and examples
Key Resources
- Setup guides for LM Studio, Qdrant, and Claude Desktop integration
- Performance benchmarks and optimization tips
- Troubleshooting guides for common issues
- API reference for all MCP tools
- Best practices for multi-client setups
š¤ Contributing
This project combines the best ideas from multiple RAG implementations. Contributions welcome for:
- Performance optimizations
- Additional document formats
- Enhanced search capabilities
- New embedding models support
- UI/dashboard development
- Documentation improvements
Development Setup
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Submit a pull request with detailed description
š License
MIT License - Use freely for personal and commercial projects.
š Acknowledgments
Built upon the excellent work of:
- lance-mcp - Document processing architecture inspiration
- sqlite-vss-mcp - Performance optimization patterns
- delorenj/mcp-qdrant-memory - TypeScript MCP foundation
- Qdrant - Vector search engine
- LM Studio - Local LLM hosting platform
- BGE-M3 - Multilingual embedding model
- Qwen3 - Document summarization model
š Support
- GitHub Issues - Bug reports and feature requests
- GitHub Discussions - Questions and community support
- Documentation - Comprehensive guides and references
For detailed API documentation, see MCP Tools Reference. For advanced setup, see Advanced Configuration.
šÆ The most advanced TypeScript RAG system with enterprise-grade features, multi-client isolation, and local-first privacy.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.