VecGrep
Semantic code search MCP server that reduces token usage by ~95% by returning top relevant code chunks instead of full files.
README
VecGrep
Cursor-style semantic code search as an MCP plugin for Claude Code.
Instead of grepping 50 files and sending 30,000 tokens to Claude, VecGrep returns the top 8 semantically relevant code chunks (~1,600 tokens). That's a ~95% token reduction for codebase queries.
Benchmarks
Measured on the VecGrep codebase itself (5 source files, ~26k tokens raw).
Token usage per query
| Mode | Avg tokens returned | vs raw read | Savings |
|---|---|---|---|
| Raw file read (baseline) | 26,009 | — | — |
search_code (top_k=8) |
~3,007 | 11.6% | 88% |
hybrid_search (top_k=8) |
~3,324 | 12.8% | 87% |
search_graph (limit=8) |
~47 | 0.2% | >99% |
search_graph returns structured node metadata only (name, kind, file, line range) — no source code — so it's ultra-cheap for structural questions ("where is X defined?", "what calls Y?").
Query latency (median, 5 runs)
| Mode | Latency |
|---|---|
search_graph |
~3ms |
hybrid_search |
~76ms |
search_code |
~83ms |
search_graph is ~30× faster than vector search — pure in-memory graph traversal, no embedding model call.
Result correctness (structural queries)
For name-based structural queries, pure vector search can rank documentation (CHANGELOG, README) above source code. The graph index fixes this:
| Query | search_code #1 |
hybrid_search #1 |
|---|---|---|
| "VectorStore search method" | [WRONG] CHANGELOG.md | [OK] store.py |
| "GraphStore build" | [WRONG] CHANGELOG.md | [OK] server.py |
| "embedding provider factory" | [OK] embedder.py | [OK] embedder.py |
| "AST chunking tree-sitter" | [OK] chunker.py | [OK] chunker.py |
The graph score (graph_score: 1.00) overrides a misleading vector match whenever the query directly names a known symbol.
Rule of thumb: use
search_codefor semantic/behaviour queries,search_graphfor structural/navigation queries,hybrid_searchwhen you need both.
How it works
- Chunk — Parses source files with tree-sitter to extract semantic units (functions, classes, methods)
- Embed — Encodes each chunk using the configured embedding provider:
- Local (default) —
all-MiniLM-L6-v2-code-search-512via fastembed ONNX (~100ms startup, no API key) or PyTorch, with auto device detection (Apple Silicon, CUDA, CPU) - Cloud (BYOK) — OpenAI, Voyage AI, or Google Gemini via your own API key (higher-quality embeddings, optional)
- Local (default) —
- Store — Saves embeddings + metadata in LanceDB under
~/.vecgrep/<project_hash>/; vector dimensions adapt automatically to the chosen provider - Search — ANN index (IVF-PQ) for fast approximate search on large codebases
Incremental re-indexing via mtime/size checks skips unchanged files.
Architecture
Installation
Requires Python 3.12 and uv.
Note: Python 3.12 is required —
tree-sitter-languagesdoes not yet have wheels for Python 3.13+.
pip install vecgrep # standard pip
uv tool install --python 3.12 vecgrep # uv tool (recommended)
Claude Code integration
Run once — works for every project:
claude mcp add --scope user vecgrep -- vecgrep
This installs VecGrep as a persistent binary and registers it in your user config (~/.claude.json) so it's available globally across all projects. Starts instantly — no download delay on Claude Code launch.
Usage with Claude
You don't trigger VecGrep manually - Claude decides when to call the tools based on what you ask.
| What you say to Claude | Tool invoked |
|---|---|
| "Index my project at /Users/me/myapp" | index_codebase |
| "How does authentication work in this codebase?" | search_code |
| "Find where database connections are set up" | search_code |
| "How many files are indexed?" | get_index_status |
| "Build a knowledge graph of my project" | index_graph |
| "What calls the VectorStore.search method?" | search_graph + graph_neighbors |
| "Find code structurally related to authentication" | hybrid_search |
Typical first-time flow:
You: "Search for how payments are handled in /Users/me/myapp"
Claude: [calls index_codebase automatically since no index exists]
Claude: [calls search_code with your query]
Claude: "Here's how payments work — in src/payments.py:42..."
After the first index, subsequent searches skip unchanged files automatically — no re-indexing needed unless your code changes.
Tools
index_codebase(path, force=False, watch=False, provider=None)
Index a project directory. Skips unchanged files on subsequent calls.
index_codebase("/path/to/myproject")
# → "Indexed 142 file(s), 1847 chunk(s) added (0 file(s) skipped, unchanged)"
# Use OpenAI embeddings instead of local
index_codebase("/path/to/myproject", provider="openai")
Provider lock: once a project is indexed with a provider, re-indexing with a different provider requires force=True (this rebuilds the vector table with the new embedding dimensions).
Note: watch=True is only supported with the local provider — live sync with cloud providers would incur unbounded API costs.
search_code(query, path, top_k=8)
Semantic search. Auto-indexes if no index exists.
search_code("how does user authentication work", "/path/to/myproject")
Returns formatted snippets with file paths, line numbers, and similarity scores:
[1] src/auth.py:45-72 (score: 0.87)
def authenticate_user(token: str) -> User:
...
[2] src/middleware.py:12-28 (score: 0.81)
...
get_index_status(path)
Check index statistics, including the embedding provider used.
Index status for: /path/to/myproject
Files indexed: 142
Total chunks: 1847
Last indexed: 2026-02-22T07:20:31+00:00
Index size: 28.4 MB
Provider: local
Model: isuruwijesiri/all-MiniLM-L6-v2-code-search-512
Dimensions: 384
index_graph(path, force=False)
Build a structural knowledge graph from the codebase using tree-sitter AST extraction. No LLM required — extracts files, functions, classes, and methods as nodes; contains, calls, imports, and inherits as directed edges. Independent of the vector index.
index_graph("/path/to/myproject")
# → "Graph built: 496 nodes, 1251 edges, 35 files processed."
search_graph(query, path, limit=20)
Keyword search over node labels (function names, class names, file names). Returns structural nodes with source location and connectivity degree. Ultra-cheap: ~47 tokens average, ~3ms latency.
search_graph("VectorStore", "/path/to/myproject")
# → [1] CLASS VectorStore (score: 1.00, degree: 39)
# src/vecgrep/store.py:49-352
graph_neighbors(node_id, path, depth=1)
Return the structural neighbourhood of any node — callers, callees, imports, contained methods, and inheritance edges. Use search_graph first to find the node ID.
graph_neighbors("VectorStore", "/path/to/myproject", depth=1)
# → Callers (18): _get_store, migrate_project, test fixtures...
# Contains (18): search, add_chunks, replace_file_chunks...
hybrid_search(query, path, top_k=8, alpha=0.6, min_score=0.0)
Vector similarity search re-ranked by graph proximity. Final score = alpha * vector_score + (1 - alpha) * graph_score. Fixes cases where documentation ranks above source code on pure embedding similarity.
hybrid_search("VectorStore search method", "/path/to/myproject", alpha=0.6)
# → [1] src/vecgrep/store.py:292-320 (blended: 0.70, vec: 0.49, graph: 1.00)
Requires both index_codebase and index_graph to have been run. Degrades gracefully to pure vector search if the graph index is absent.
Configuration
VecGrep can be tuned via environment variables:
Local provider
| Variable | Default | Description |
|---|---|---|
VECGREP_BACKEND |
onnx |
Local backend: onnx (fastembed, fast startup) or torch (sentence-transformers, any HF model) |
VECGREP_MODEL |
isuruwijesiri/all-MiniLM-L6-v2-code-search-512 |
HuggingFace model ID (local provider only) |
Backend comparison:
| Backend | Startup | PyTorch required | Custom HF models |
|---|---|---|---|
onnx (default) |
~100ms | No | ONNX-exported models only |
torch |
~2–3s | Yes | Any HuggingFace model |
Cloud providers (BYOK — Bring Your Own Key)
VecGrep supports three cloud embedding providers. Each requires an API key environment variable and the corresponding optional dependency.
| Provider | Env var | Model | Dims | Install extra |
|---|---|---|---|---|
openai |
VECGREP_OPENAI_KEY |
text-embedding-3-small |
1536 | vecgrep[openai] |
voyage |
VECGREP_VOYAGE_KEY |
voyage-code-3 |
1024 | vecgrep[voyage] |
gemini |
VECGREP_GEMINI_KEY |
gemini-embedding-exp-03-07 |
3072 | vecgrep[gemini] |
Install cloud extras:
# Single provider
uv tool install --python 3.12 'vecgrep[openai]'
pip install 'vecgrep[openai]'
# All cloud providers at once
pip install 'vecgrep[cloud]'
Use a cloud provider:
# Set your API key
export VECGREP_OPENAI_KEY=sk-...
# Index with OpenAI embeddings
index_codebase("/path/to/myproject", provider="openai")
# Or tell Claude to use it:
# "Index my project at /path/to/myproject using openai embeddings"
Switch providers (requires force re-index to rebuild the vector table):
index_codebase("/path/to/myproject", provider="voyage", force=True)
Local backend examples:
# Use a different model with the torch backend
VECGREP_BACKEND=torch VECGREP_MODEL=sentence-transformers/all-MiniLM-L6-v2 vecgrep
# Use a custom ONNX model
VECGREP_MODEL=my-org/my-onnx-model vecgrep
Supported languages
Python, JavaScript/TypeScript, Rust, Go, Java, C/C++, Ruby, Swift, Kotlin, C#
All other text files fall back to sliding-window line chunks.
Index location
~/.vecgrep/<sha256-of-project-path>/index.db
Each project gets its own isolated index. Delete the directory to wipe the index.
Acknowledgements
The embedding model used by VecGrep is all-MiniLM-L6-v2-code-search-512, a model fine-tuned specifically for semantic code search by @isuruwijesiri.
@misc{all_MiniLM_L6_v2_code_search_512,
author = {isuruwijesiri},
title = {all-MiniLM-L6-v2-code-search-512},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/isuruwijesiri/all-MiniLM-L6-v2-code-search-512}
}
Community
| ? Questions | Start a Q&A discussion |
| + Ideas | Share an idea |
| > Show & Tell | Share how you use VecGrep |
| ! Bugs | Open an issue |
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.