qdrant-mcp-hybrid

qdrant-mcp-hybrid

Enables multi-tenant RAG with Qdrant, LM Studio embeddings and summaries, and advanced document processing for semantic search across isolated client collections.

Category
Visit Server

README

License: MIT TypeScript Qdrant Local-First LM Studio smithery badge

šŸš€ Qdrant MCP Hybrid - Ultimate RAG System

The most advanced TypeScript MCP server for Qdrant with multi-client isolation, LM Studio integration, and enterprise-grade document processing

🌟 What is This?

This is the ultimate evolution of RAG (Retrieval-Augmented Generation) systems, combining the best practices from:

  • lance-mcp architecture & document processing
  • sqlite-vss-mcp performance optimizations & concurrency
  • delorenj/mcp-qdrant-memory TypeScript foundation & MCP integration

Result: A production-ready, multi-tenant RAG system with client isolation, advanced seeding, and LM Studio integration.

⚔ Key Features

šŸ¢ Multi-Client Architecture

  • Complete isolation between clients - perfect for agencies, consultants, or organizations managing multiple projects
  • Separate collections for each client: {client}_catalog + {client}_chunks
  • Privacy-first design for sensitive documents

🧠 LM Studio Integration

  • BGE-M3 embeddings (1024 dimensions) for semantic search
  • Qwen3-8B summaries for document overviews
  • Zero cloud dependency - everything runs locally for maximum privacy

šŸš€ Advanced Document Processing

  • SHA256 deduplication - never process the same document twice (90%+ time savings on updates)
  • Multi-format support - PDF, Markdown, TXT, DOCX
  • Incremental updates - only process changed files
  • Batch processing - efficient API usage with p-limit concurrency control

šŸ” Enterprise Search

  • Semantic catalog search - find documents by meaning, not just keywords
  • Granular chunk search - search within specific documents
  • Cross-client search - find information across all clients
  • Rich metadata - source tracking, chunk indexing, similarity scores

šŸš€ Quick Install via NPM

Global Installation (Recommended)

# Install globally for easy project setup
npm install -g claude-qdrant-mcp

# Create new project
mkdir my-rag-project
cd my-rag-project
qdrant-setup

# Or use the interactive setup
npm run setup

Local Project Installation

# Install in existing project
npm install claude-qdrant-mcp

# Run interactive setup
npx qdrant-setup

What the Auto-Setup Does

āœ… Dependency Check - Verifies Node.js, Qdrant, and LM Studio
āœ… Environment Config - Interactive .env file creation
āœ… Claude Desktop Integration - Automatic MCP server configuration
āœ… Sample Documents - Creates test files for immediate use
āœ… Connection Testing - Validates all services are working

One-Command Install & Test

# Complete setup and test in one go
npm install -g claude-qdrant-mcp && \
mkdir my-rag && cd my-rag && \
qdrant-setup && \
npm run test-connection

Available Commands

After installation, you have access to:

# Interactive setup wizard
qdrant-setup

# Test all connections
npm run test-connection

# Seed documents
npm run seed -- --client work --filesdir ./documents

# Start MCP server
npm start

# Development mode
npm run watch

ļæ½ Table of Contents

šŸ› ļø Manual Installation & Setup

Prerequisites

  • Node.js 18+
  • LM Studio running locally with BGE-M3 + Qwen3 models
  • Qdrant server (local Docker or Qdrant Cloud)

Quick Start

# Clone the repository
git clone https://github.com/marlian/claude-qdrant-mcp.git
cd claude-qdrant-mcp

# Install dependencies
npm install

# Setup environment
cp .env.example .env
# Edit .env with your configuration

# Build the project
npm run build

# Test with help
npm run seed -- --help

Environment Configuration

Create a .env file with your settings:

# Qdrant Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=your-api-key-if-using-cloud

# LM Studio Configuration  
LM_STUDIO_URL=http://127.0.0.1:1235
EMBEDDING_MODEL=text-embedding-finetuned-bge-m3
EMBEDDING_DIM=1024
LLM_MODEL=qwen/qwen3-8b

# Multi-Client Setup (customize with your client names)
CLIENT_COLLECTIONS=client_a,client_b,personal,work,research

# Performance Tuning
CONCURRENCY=5
BATCH_SIZE=10
CHUNK_SIZE=500
CHUNK_OVERLAP=10
DEBUG=false

šŸš€ LM Studio Setup

Required Models

  1. BGE-M3 Embedding Model

    • Download from LM Studio model library
    • Model name: text-embedding-finetuned-bge-m3
    • Purpose: Generate 1024-dim embeddings for semantic search
  2. Qwen3-8B Chat Model

    • Download from LM Studio model library
    • Model name: qwen/qwen3-8b
    • Purpose: Generate document summaries

LM Studio Configuration

  1. Start LM Studio
  2. Load both models
  3. Start the server (default port 1235)
  4. Verify connection: curl http://127.0.0.1:1235/v1/models

šŸ“Š Usage Examples

Document Seeding

# Seed documents for specific client
npm run seed -- --client work --filesdir /path/to/work/documents

# Force overwrite existing data (full reprocessing)
npm run seed -- --client personal --filesdir /path/to/personal/docs --overwrite

# Validate documents without seeding  
npm run seed -- --client research --filesdir /path/to/research/docs --validate-only

# Debug mode for troubleshooting
npm run seed -- --client client_a --filesdir /path/to/docs --debug

MCP Server Usage

# Run the MCP server
npm start

# Or in development mode with watch
npm run watch

Claude Desktop Integration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "qdrant-rag": {
      "command": "node",
      "args": ["/absolute/path/to/claude-qdrant-mcp/dist/index.js"],
      "env": {
        "QDRANT_URL": "http://localhost:6333",
        "QDRANT_API_KEY": "your-api-key-if-needed",
        "CLIENT_COLLECTIONS": "work,personal,research"
      }
    }
  }
}

šŸ”§ Available MCP Tools

collection_info

Get status of all collections and clients.

// No parameters needed
collection_info()
// Returns: Collection stats, client list, system status

catalog_search

Search document summaries for a specific client.

{
  "query": "quarterly business strategy",
  "client": "work", 
  "limit": 10
}

chunks_search

Search document chunks with optional source filtering.

{
  "query": "machine learning implementation",
  "client": "research",
  "source": "/path/to/specific/document.md",  // optional
  "limit": 5
}

all_chunks_search

Search across all clients and collections.

{
  "query": "project management best practices",
  "limit": 20
}

šŸ—ļø Architecture Deep Dive

Collection Structure

Qdrant Collections:
ā”œā”€ā”€ work_catalog           # Document summaries for work
ā”œā”€ā”€ work_chunks            # Document chunks for work  
ā”œā”€ā”€ personal_catalog       # Document summaries for personal
ā”œā”€ā”€ personal_chunks        # Document chunks for personal
ā”œā”€ā”€ research_catalog       # Document summaries for research
ā”œā”€ā”€ research_chunks        # Document chunks for research
└── ... (per client)

Data Flow Pipeline

Documents → Hash Check → Content Extract → LM Summary → 
Chunk Split → BGE-M3 Embed → Batch Process → Qdrant Store → MCP Search

Document Processing Pipeline

  1. Directory Scan - Find all supported documents (.pdf, .md, .txt, .docx)
  2. Hash Validation - SHA256 deduplication (skip unchanged files)
  3. Content Processing - Extract text using appropriate parsers
  4. Summary Generation - LM Studio Qwen3 creates document overviews
  5. Chunk Creation - Split documents with configurable overlap
  6. Batch Embedding - BGE-M3 vectorization in efficient batches
  7. Qdrant Storage - Dual collection storage (catalog + chunks)

šŸŽÆ Performance & Scalability

Optimizations Applied

  • Concurrency Control - p-limit prevents API overload
  • Batch Processing - Multiple embeddings per API call
  • Smart Caching - SHA256 prevents duplicate processing
  • Memory Efficient - Streaming document processing
  • Error Recovery - Graceful handling of failures

Performance Benchmarks

Metric Performance Notes
Documents/minute 50-100 Depends on document size and LM Studio performance
Memory usage 100-500MB During processing, minimal at rest
Search latency <200ms Average semantic search response time
Concurrency 5 parallel Configurable based on system resources
Hash optimization 90%+ savings On incremental updates

Scalability Features

  • Multi-client isolation - No data leakage between clients
  • Horizontal scaling - Add more Qdrant nodes as needed
  • Local-first - No external API dependencies or costs
  • Incremental processing - Only process changed documents

šŸ” Troubleshooting

Common Issues

āŒ "LM Studio connection failed"

# Check LM Studio is running
curl http://127.0.0.1:1235/v1/models

# Verify models are loaded
# BGE-M3 for embeddings, Qwen3 for summaries

āŒ "Qdrant connection failed"

# Check Qdrant server (local)
curl http://localhost:6333/collections

# Check Qdrant Cloud with API key
curl -H "api-key: YOUR_KEY" https://your-cluster.qdrant.io/collections

āŒ "No documents found"

# Check file path exists and contains supported formats
ls -la /path/to/documents

# Verify supported file types (.pdf, .md, .txt, .docx)
find /path/to/documents -name "*.md" -o -name "*.pdf" -o -name "*.txt" -o -name "*.docx"

Debug Mode

Enable comprehensive logging:

export DEBUG=true
npm run seed -- --client test --filesdir ./sample-docs --debug

šŸš€ Development

Project Structure

src/
ā”œā”€ā”€ config.ts          # Enhanced configuration system
ā”œā”€ā”€ types.ts           # RAG document types & interfaces  
ā”œā”€ā”€ index.ts           # MCP server & tool handlers
ā”œā”€ā”€ seed.ts            # Ultimate document processing engine
ā”œā”€ā”€ persistence/
│   └── qdrant.ts      # Multi-collection Qdrant client
└── validation.ts      # Input validation & safety

Building & Testing

# Development build
npm run build

# Watch mode for development
npm run watch

# Test processing without modifying database
npm run seed -- --validate-only --client test --filesdir ./test-docs

Adding New Clients

  1. Update CLIENT_COLLECTIONS in .env
  2. Run seed command with new client name
  3. Collections are created automatically

šŸ“ˆ Migration from Other Systems

From lance-mcp

  • Collections replace single database files
  • Enhanced config replaces hardcoded settings
  • Multi-client replaces single-tenant approach
  • Cloud sync replaces local-only storage

From sqlite-vss-mcp

  • Qdrant replaces SQLite + VSS for better performance
  • TypeScript replaces Python implementation
  • MCP integration replaces custom API

From original mcp-qdrant-memory

  • RAG document model replaces knowledge graph entities
  • LM Studio replaces OpenAI for cost-free local processing
  • Multi-collection replaces single collection architecture

šŸ” Privacy & Security

  • Local-first processing - Documents never leave your machine
  • Client isolation - Complete data separation between clients
  • No external APIs - LM Studio runs entirely offline
  • Hash-based deduplication - Secure content fingerprinting
  • Configurable storage - Use local Qdrant or secure cloud instances

šŸ›£ļø Roadmap

Planned Features

  • Web UI for collection management and search
  • Additional embedding models (support for other local models)
  • Advanced chunking strategies (semantic splitting)
  • Hybrid search (combine vector + keyword search)
  • Export/import collections for backup and sharing

Integration Possibilities

  • Obsidian plugin for direct vault integration
  • API server mode for external applications
  • Batch processing for large document sets
  • Real-time file watching for automatic updates

šŸ“š Extended Documentation

Looking for deeper details, integrations or low-level references?
Check out the full documentation under /docs:

Key Resources

  • Setup guides for LM Studio, Qdrant, and Claude Desktop integration
  • Performance benchmarks and optimization tips
  • Troubleshooting guides for common issues
  • API reference for all MCP tools
  • Best practices for multi-client setups

šŸ¤ Contributing

This project combines the best ideas from multiple RAG implementations. Contributions welcome for:

  • Performance optimizations
  • Additional document formats
  • Enhanced search capabilities
  • New embedding models support
  • UI/dashboard development
  • Documentation improvements

Development Setup

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Submit a pull request with detailed description

šŸ“„ License

MIT License - Use freely for personal and commercial projects.

šŸ™ Acknowledgments

Built upon the excellent work of:

  • lance-mcp - Document processing architecture inspiration
  • sqlite-vss-mcp - Performance optimization patterns
  • delorenj/mcp-qdrant-memory - TypeScript MCP foundation
  • Qdrant - Vector search engine
  • LM Studio - Local LLM hosting platform
  • BGE-M3 - Multilingual embedding model
  • Qwen3 - Document summarization model

šŸ“ž Support

For detailed API documentation, see MCP Tools Reference. For advanced setup, see Advanced Configuration.


šŸŽÆ The most advanced TypeScript RAG system with enterprise-grade features, multi-client isolation, and local-first privacy.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured