semantic-search-mcp
Provides semantic code search over codebases using local embeddings with natural language queries. Supports hybrid search, file watching, and respects .gitignore.
README
Semantic Search MCP Server
An MCP server that provides semantic code search using local embeddings. Search your codebase with natural language queries like "authentication middleware" or "database connection pooling".
Features
- Hybrid search: Combines vector similarity (Jina code embeddings) with FTS5 keyword matching using Reciprocal Rank Fusion
- 165+ languages: Tree-sitter parsing for Python, TypeScript, JavaScript, Go, Rust, Java, C/C++, Ruby, PHP, and more
- Incremental indexing: File watcher automatically detects additions, modifications, and deletions
- Respects .gitignore: Honors your project's
.gitignorefiles (including nested ones) - Auto-initialization: Model loads and codebase indexes in the background on server startup
- Zero external APIs: All embeddings generated locally with FastEmbed
Installation
uv tool install semantic-search-mcp
Or with pip:
pip install semantic-search-mcp
Or run directly without installing:
uvx semantic-search-mcp
Quick Start
Add to Claude Code
Option A: Project-level config (recommended)
After installing with uv tool install or pip install, create .mcp.json in your project root:
{
"mcpServers": {
"semantic-search": {
"command": "semantic-search-mcp"
}
}
}
Option B: CLI
claude mcp add semantic-search -- semantic-search-mcp
Option C: Without installing (ephemeral)
If you prefer not to install, use uvx to run in an ephemeral environment:
{
"mcpServers": {
"semantic-search": {
"command": "uvx",
"args": ["semantic-search-mcp"]
}
}
}
Use
The server auto-initializes on startup.
Available Tools
| Tool | Description |
|---|---|
search_code |
Search codebase with natural language |
get_status |
Get server state, progress, and statistics |
pause_watcher |
Pause file watching (events discarded) |
resume_watcher |
Resume file watching |
reindex |
Start full reindex (runs in background) |
cancel_indexing |
Cancel running indexing job |
clear_index |
Wipe all indexed data |
exclude_paths |
Add paths to ignore (session-only) |
include_paths |
Remove paths from exclusion list |
How It Works
Indexing
On startup, the server:
- Scans your codebase for supported file types
- Parses code into semantic chunks (functions, classes, methods) using Tree-sitter
- Generates embeddings for each chunk using Jina's code embedding model
- Stores everything in a local SQLite database with vector search support
File Watching
The server monitors your codebase for changes in real-time:
| Event | Action |
|---|---|
| File created | Parsed, embedded, and added to index |
| File modified | Re-indexed if content hash changed |
| File deleted | Removed from index |
Changes are debounced (default 1s) to batch rapid modifications.
What Gets Indexed
Included:
- Files with code extensions:
.py,.js,.ts,.tsx,.jsx,.go,.rs,.java,.c,.cpp,.h,.rb,.php,.swift,.kt,.scala, and more
Excluded:
- Files matching
.gitignorepatterns (all.gitignorefiles in your project are respected) - Common non-code directories:
node_modules,__pycache__,.venv,build,dist,.git,vendor, etc. - Binary files and non-code file types
Configuration
Environment variables:
| Variable | Default | Description |
|---|---|---|
SEMANTIC_SEARCH_DB_PATH |
.semantic-search/index.db |
Index database location |
SEMANTIC_SEARCH_EMBEDDING_MODEL |
jinaai/jina-embeddings-v2-base-code |
Embedding model |
SEMANTIC_SEARCH_MIN_SCORE |
0.3 |
Minimum relevance threshold (0-1) |
SEMANTIC_SEARCH_DEBOUNCE_MS |
1000 |
File watcher debounce in milliseconds |
SEMANTIC_SEARCH_BATCH_SIZE |
50 |
Files per batch (reduce if running out of memory) |
SEMANTIC_SEARCH_MAX_FILE_SIZE_KB |
512 |
Skip files larger than this (KB) |
SEMANTIC_SEARCH_EMBEDDING_BATCH_SIZE |
8 |
Texts per embedding call (reduce if OOM) |
SEMANTIC_SEARCH_EMBEDDING_THREADS |
4 |
ONNX runtime threads (higher = faster on multi-core) |
SEMANTIC_SEARCH_USE_QUANTIZED |
true |
Use INT8 quantized model (30-40% faster) |
Performance
GPU Acceleration
GPU acceleration is auto-detected and used when available:
| Platform | Provider | Installation |
|---|---|---|
| NVIDIA | CUDA | pip install semantic-search-mcp[gpu] |
| Apple Silicon | CoreML | Automatic (M1/M2/M3) |
| AMD | ROCm | Install ROCm-enabled onnxruntime |
| Windows | DirectML | Install DirectML-enabled onnxruntime |
Alternative Models
For faster indexing (with quality tradeoffs), you can use a lighter model:
| Model | Dimensions | Speed | Best For |
|---|---|---|---|
jinaai/jina-embeddings-v2-base-code |
768 | Baseline | Code search (default) |
BAAI/bge-small-en-v1.5 |
384 | ~10x faster | General text |
sentence-transformers/all-MiniLM-L6-v2 |
384 | ~32x faster | Speed priority |
To use an alternative model:
export SEMANTIC_SEARCH_EMBEDDING_MODEL="sentence-transformers/all-MiniLM-L6-v2"
Note: Changing models requires a full reindex (delete .semantic-search/ directory).
UniXcoder (Experimental)
Microsoft UniXcoder is a code-specific model pre-trained on code + AST + comments. It may provide better semantic understanding of code structure, but is substantially slower (~20x slower than Jina).
| Model | Dimensions | Speed | Languages |
|---|---|---|---|
microsoft/unixcoder-base |
768 | ~20x slower | 6 (java, ruby, python, php, js, go) |
microsoft/unixcoder-base-nine |
768 | ~20x slower | 9 (+ c, c++, c#) |
Installation (requires additional dependencies):
pip install semantic-search-mcp[unixcoder]
Usage:
export SEMANTIC_SEARCH_EMBEDDING_MODEL="microsoft/unixcoder-base-nine"
When to use UniXcoder:
- You prioritize search quality over indexing speed
- Your codebase is small to medium sized
- You have GPU acceleration (CUDA or Apple Silicon MPS)
When to avoid UniXcoder:
- Large codebases (10,000+ files) - indexing will take hours
- You need fast initial indexing
- Running on CPU without GPU acceleration
Claude Code Integration
Skills and commands are automatically installed when the MCP server first starts:
- Skills →
~/.claude/skills/(AI auto-discovery) - Commands →
~/.claude/commands/(user-invocable slash commands)
To manually reinstall or update:
semantic-search-mcp-install-skills
Available Slash Commands
| Command | Description |
|---|---|
/semantic-search-search <query> |
Search codebase with natural language |
/semantic-search-status |
Check server status and index stats |
/semantic-search-reindex |
Trigger full codebase reindex |
/semantic-search-cancel |
Cancel running indexing job |
/semantic-search-clear |
Wipe all indexed data |
/semantic-search-pause |
Pause file watcher |
/semantic-search-resume |
Resume file watcher |
Requirements
- Python 3.11+
- ~700MB disk for embedding model (downloaded on first run, ~150MB with INT8 quantization)
- ~1GB RAM for embedding model
License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.