Fast Embedding MCP SSE
Provides fast static embedding, similarity, and search capabilities via MCP tools and an OpenAI-compatible HTTP API using a tiny 16M-parameter English embedding model.
README
<img width="3534" height="1625" alt="sse_v2" src="https://github.com/user-attachments/assets/38da5650-14fe-41a3-8a9b-3b1f3884a945" />
Fast Embedding MCP / SSE — Stable Static Embedding server
Serve RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2
over an OpenAI-compatible HTTP API and an MCP server (stdio).
The model is a ~16M-parameter English static embedding model: 512D native with Matryoshka (MRL) truncation to 256 / 128 / 64 / 32. It is fast (no attention) and tiny.
Install
This project uses uv for environment management. Install uv first if you don't have it (instructions):
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
Then clone and sync. uv sync creates a .venv, installs the pinned
dependencies from uv.lock, and installs the project itself:
git clone https://github.com/Rikka-Botan/Fast-Embedding-MCP-SSE.git
cd Fast-Embedding-MCP-SSE
uv sync
uv picks a compatible Python (3.10+) automatically — no manual venv or
activation needed; prefix commands with uv run. The first server run
downloads the model from Hugging Face (~60 MB) and caches it.
HTTP API
uv run python -m sse_embedding.api # serves on http://0.0.0.0:8000
# or, equivalently: uv run sse-api
Configurable via SSE_API_HOST / SSE_API_PORT.
Endpoints
| Method | Path | Purpose |
|---|---|---|
| POST | /v1/embeddings |
OpenAI-compatible embeddings (supports dimensions) |
| POST | /similarity |
Cosine similarity matrix between two text sets |
| POST | /search |
Rank documents against a query (stateless) |
| POST | /index/add |
Add documents to the in-memory index |
| POST | /index/query |
Query the in-memory index |
| GET | /index/stats |
Index size |
| POST | /index/clear |
Empty the index |
| GET | /health |
Health check |
OpenAI-compatible example
Works with the OpenAI SDK by pointing base_url at this server:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
resp = client.embeddings.create(
model="RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2",
input=["hello world", "good morning"],
dimensions=256, # MRL truncation: 512/256/128/64/32
)
print(len(resp.data[0].embedding)) # 256
Or raw:
curl -X POST http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"input": "hello world", "dimensions": 128}'
Search / index example
curl -X POST http://localhost:8000/index/add \
-H "Content-Type: application/json" \
-d '{"documents": ["The cat sat on the mat", "Paris is in France"]}'
curl -X POST http://localhost:8000/index/query \
-H "Content-Type: application/json" \
-d '{"query": "Where is Paris?", "top_k": 1}'
MCP server (stdio)
uv run python -m sse_embedding.mcp_server
# or, equivalently: uv run sse-mcp
Tools exposed: embed_text, similarity, search, index_add,
index_query, index_stats, index_clear.
Register with Claude Code
Requires the Claude Code CLI. If
claudeis not a recognized command, you are likely using the Claude Desktop app — use the Claude Desktop config below instead.
Run from the cloned project directory:
claude mcp add sse-embedding -- uv run python -m sse_embedding.mcp_server
To make the registration work from any directory, pass the project path to uv
with --directory:
claude mcp add sse-embedding -- uv run --directory /path/to/Fast-Embedding-MCP-SSE python -m sse_embedding.mcp_server
Register with Claude Desktop
Add to claude_desktop_config.json, replacing /path/to/... with the
absolute path where you cloned this repository. uv run resolves the project's
environment from the given directory.
macOS / Linux:
{
"mcpServers": {
"sse-embedding": {
"command": "uv",
"args": ["run", "--directory", "/path/to/Fast-Embedding-MCP-SSE", "python", "-m", "sse_embedding.mcp_server"]
}
}
}
Windows:
{
"mcpServers": {
"sse-embedding": {
"command": "uv",
"args": ["run", "--directory", "C:\\path\\to\\Fast-Embedding-MCP-SSE", "python", "-m", "sse_embedding.mcp_server"]
}
}
}
If Claude Desktop reports that uv was not found, replace "command": "uv"
with the absolute path to the uv executable (which uv on macOS/Linux,
(Get-Command uv).Source in PowerShell), or point command directly at the
.venv interpreter that uv sync created
(/path/to/Fast-Embedding-MCP-SSE/.venv/bin/python, or on Windows
C:\\path\\to\\Fast-Embedding-MCP-SSE\\.venv\\Scripts\\python.exe) with
"args": ["-m", "sse_embedding.mcp_server"].
Matryoshka dimensions
Valid dim / dimensions values are 512, 256, 128, 64, 32. Smaller
dimensions are faster and smaller with graceful quality degradation.
Truncation is applied to the full 512D vector and the result is renormalized,
so cosine similarity stays valid at any level.
License
Apache-2.0
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.