ollqd

ollqd

Enables indexing and semantic search of codebases and documents via MCP, using Ollama embeddings and Qdrant vector store.

Category
Visit Server

README

Ollqd — MCP Client-Server RAG System

Local-first RAG system that indexes codebases and documents into Qdrant using Ollama embeddings. Exposes everything through MCP (Model Context Protocol) so AI assistants can search your code via tool-calling.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│  User Interface                                                  │
│  ┌─────────────┐  ┌─────────────────────────────────────────┐   │
│  │ ollqd-chat   │  │ Claude Desktop / any MCP host           │   │
│  │ (CLI / REPL) │  │ (connects to ollqd-server directly)     │   │
│  └──────┬───────┘  └────────────────┬────────────────────────┘   │
└─────────┼──────────────────────────┼────────────────────────────┘
          │ stdio JSON-RPC           │ stdio JSON-RPC
┌─────────▼──────────────────────────▼────────────────────────────┐
│  Ollqd MCP Server (FastMCP)                                     │
│  ┌───────────────┐ ┌─────────────────┐ ┌─────────────────────┐  │
│  │index_codebase │ │index_documents  │ │semantic_search      │  │
│  │index docs     │ │markdown/text/rst│ │embed query → Qdrant │  │
│  └───────┬───────┘ └────────┬────────┘ └──────────┬──────────┘  │
│  ┌───────┴──────┐  ┌───────┴────────┐             │             │
│  │list_collections│ │delete_collection│             │             │
│  └──────────────┘  └────────────────┘             │             │
└─────────┬──────────────────────────────────────────┼────────────┘
          │ /api/embed                               │
┌─────────▼──────────┐                    ┌──────────▼─────────┐
│  Ollama             │                    │  Qdrant             │
│  nomic-embed-text   │                    │  cosine similarity  │
│  + chat models      │                    │  payload indexes    │
└────────────────────┘                    └────────────────────┘

How it works

  1. Discovery — Walks the codebase, filters by language (40+ extensions), skips lock files / build artifacts / vendor dirs.

  2. Code-aware chunking — Splits files at natural code boundaries (function defs, class declarations, impl blocks) rather than blindly cutting at token limits. Overlapping windows preserve context.

  3. Embedding — Sends chunks to Ollama's /api/embed in batches. Each chunk is prefixed with file path + language + line range for better semantic grounding.

  4. Storage — Upserts into Qdrant with full metadata payload. Payload indexes on file_path, language, and content_hash enable filtered search and incremental re-indexing.

  5. RAG loop — The client sends user queries to Ollama with MCP tools attached. Ollama decides when to call semantic_search, gets results from the server, and synthesizes a final answer with code citations.

Setup

Prerequisites

  • Ollama running locally with an embedding model pulled
  • Qdrant running (Docker recommended)
  • Python 3.10+
# Pull the embedding model
ollama pull nomic-embed-text

# Pull a chat model (any that supports tool-calling)
ollama pull qwen2.5:14b

# Start Qdrant (and optionally Ollama via Docker)
docker compose up -d

Install

# With uv (recommended)
uv venv && source .venv/bin/activate
uv pip install -e ".[client,dev]"

# Or with pip
pip install -e ".[client,dev]"

Usage

Start the MCP server (standalone)

ollqd-server

The server communicates over stdio using JSON-RPC (MCP protocol). It's meant to be launched by MCP clients, not used directly.

Interactive RAG chat

# Interactive REPL — ask questions about your codebase
ollqd-chat --interactive

# Single query
ollqd-chat "how does the auth middleware work?"

# Use a different chat model
ollqd-chat --interactive --model llama3.1

# Debug mode
ollqd-chat -v "find the database connection setup"

REPL commands:

  • :quit / :q — exit
  • :model <name> — switch chat model on the fly

Use with Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "ollqd": {
      "command": "ollqd-server",
      "args": []
    }
  }
}

Then in Claude Desktop, ask things like:

  • "Index my project at /path/to/codebase"
  • "Search for how authentication is implemented"
  • "What error handling patterns are used?"
  • "List all indexed collections"

MCP Tools

Tool Description
index_codebase Walk + chunk + embed + upsert code files from a directory
index_documents Chunk + embed + upsert document files (markdown, text, rst)
semantic_search Embed a natural language query and search Qdrant
list_collections List all Qdrant collections with point counts
delete_collection Drop a collection (requires confirm=true)

Configuration

Environment variables

Variable Default Description
OLLAMA_URL http://localhost:11434 Ollama base URL
QDRANT_URL http://localhost:6333 Qdrant REST URL
OLLAMA_CHAT_MODEL qwen2.5:14b Chat model for RAG
OLLAMA_EMBED_MODEL nomic-embed-text Embedding model
OLLAMA_TIMEOUT_S 120 Request timeout (seconds)
CHUNK_SIZE 512 Approximate tokens per chunk
CHUNK_OVERLAP 64 Overlap tokens between chunks
MAX_TOOL_ROUNDS 6 Max tool-calling rounds per query

ollqd.toml

[ollama]
host = "http://localhost:11434"
chat_model = "qwen2.5:14b"
embed_model = "nomic-embed-text"
timeout = 120

[qdrant]
host = "http://localhost:6333"
default_collection = "codebase"

[indexing]
chunk_size = 512
chunk_overlap = 64
max_file_size_kb = 512

[server]
name = "ollqd-rag-server"
transport = "stdio"

[client]
max_tool_rounds = 6

Project structure

src/ollqd/
├── __init__.py
├── config.py          # AppConfig dataclass + env var overrides
├── errors.py          # Exception hierarchy
├── models.py          # FileInfo, Chunk, SearchResult, IndexingStats
├── chunking.py        # Code-aware + document chunking
├── discovery.py       # File discovery (40+ languages)
├── embedder.py        # OllamaEmbedder wrapping /api/embed
├── vectorstore.py     # QdrantManager (upsert, search, incremental)
├── server/
│   └── main.py        # FastMCP server with 5 tools
└── client/
    ├── mcp_bridge.py  # MCP session over stdio
    ├── ollama_agent.py # Ollama chat with tool-calling
    ├── rag_loop.py    # RAG loop runner
    └── main.py        # CLI entry point

Supported languages

Python, Go, JavaScript, TypeScript, Rust, Java, Kotlin, Scala, C, C++, C#, Ruby, PHP, Swift, Lua, Shell, SQL, R, HTML, CSS, SCSS, YAML, TOML, JSON, Markdown, reStructuredText, Terraform, HCL, Dockerfile, Protobuf, GraphQL.

Embedding models

Any Ollama model that supports /api/embed works. Recommended:

Model Dimensions Notes
nomic-embed-text 768 Good balance of quality and speed (default)
mxbai-embed-large 1024 Higher quality, slower
all-minilm 384 Fast, smaller footprint
snowflake-arctic-embed 1024 Strong code understanding

Design decisions

Why MCP? — The Model Context Protocol lets any compatible AI assistant (Claude Desktop, custom clients, IDE extensions) use ollqd's indexing and search tools without custom integration code.

Why not tree-sitter for chunking? — Tree-sitter gives perfect AST-based splits but adds a heavy dependency per language. The heuristic boundary detection covers ~90% of cases with zero extra setup.

Why deterministic point IDs?md5(file_path::chunk_N) means re-indexing the same file overwrites existing points instead of creating duplicates. This makes incremental mode reliable.

Why prefix chunks with metadata? — Embedding models produce better vectors when given context. "File: auth/middleware.go | Language: go | Lines 45-82" followed by the code produces more semantically meaningful vectors.

Legacy scripts

The standalone scripts from v0.1 are still available:

# Bulk index (standalone, no MCP)
python codebase_indexer.py /path/to/project --collection myproject

# Search (standalone, no MCP)
python codebase_search.py "auth middleware" --interactive

See DESIGN.md for the full architecture document with diagrams, security analysis (STRIDE), and detailed API reference.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured