Fast Embedding MCP SSE

Fast Embedding MCP SSE

Provides fast static embedding, similarity, and search capabilities via MCP tools and an OpenAI-compatible HTTP API using a tiny 16M-parameter English embedding model.

Category
Visit Server

README

<img width="3534" height="1625" alt="sse_v2" src="https://github.com/user-attachments/assets/38da5650-14fe-41a3-8a9b-3b1f3884a945" />

Fast Embedding MCP / SSE — Stable Static Embedding server

Serve RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2 over an OpenAI-compatible HTTP API and an MCP server (stdio).

The model is a ~16M-parameter English static embedding model: 512D native with Matryoshka (MRL) truncation to 256 / 128 / 64 / 32. It is fast (no attention) and tiny.

Install

This project uses uv for environment management. Install uv first if you don't have it (instructions):

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Then clone and sync. uv sync creates a .venv, installs the pinned dependencies from uv.lock, and installs the project itself:

git clone https://github.com/Rikka-Botan/Fast-Embedding-MCP-SSE.git
cd Fast-Embedding-MCP-SSE
uv sync

uv picks a compatible Python (3.10+) automatically — no manual venv or activation needed; prefix commands with uv run. The first server run downloads the model from Hugging Face (~60 MB) and caches it.

HTTP API

uv run python -m sse_embedding.api   # serves on http://0.0.0.0:8000
# or, equivalently:  uv run sse-api

Configurable via SSE_API_HOST / SSE_API_PORT.

Endpoints

Method Path Purpose
POST /v1/embeddings OpenAI-compatible embeddings (supports dimensions)
POST /similarity Cosine similarity matrix between two text sets
POST /search Rank documents against a query (stateless)
POST /index/add Add documents to the in-memory index
POST /index/query Query the in-memory index
GET /index/stats Index size
POST /index/clear Empty the index
GET /health Health check

OpenAI-compatible example

Works with the OpenAI SDK by pointing base_url at this server:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
resp = client.embeddings.create(
    model="RikkaBotan/stable-static-embedding-fast-retrieval-mrl-en-v2",
    input=["hello world", "good morning"],
    dimensions=256,          # MRL truncation: 512/256/128/64/32
)
print(len(resp.data[0].embedding))   # 256

Or raw:

curl -X POST http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input": "hello world", "dimensions": 128}'

Search / index example

curl -X POST http://localhost:8000/index/add \
  -H "Content-Type: application/json" \
  -d '{"documents": ["The cat sat on the mat", "Paris is in France"]}'

curl -X POST http://localhost:8000/index/query \
  -H "Content-Type: application/json" \
  -d '{"query": "Where is Paris?", "top_k": 1}'

MCP server (stdio)

uv run python -m sse_embedding.mcp_server
# or, equivalently:  uv run sse-mcp

Tools exposed: embed_text, similarity, search, index_add, index_query, index_stats, index_clear.

Register with Claude Code

Requires the Claude Code CLI. If claude is not a recognized command, you are likely using the Claude Desktop app — use the Claude Desktop config below instead.

Run from the cloned project directory:

claude mcp add sse-embedding -- uv run python -m sse_embedding.mcp_server

To make the registration work from any directory, pass the project path to uv with --directory:

claude mcp add sse-embedding -- uv run --directory /path/to/Fast-Embedding-MCP-SSE python -m sse_embedding.mcp_server

Register with Claude Desktop

Add to claude_desktop_config.json, replacing /path/to/... with the absolute path where you cloned this repository. uv run resolves the project's environment from the given directory.

macOS / Linux:

{
  "mcpServers": {
    "sse-embedding": {
      "command": "uv",
      "args": ["run", "--directory", "/path/to/Fast-Embedding-MCP-SSE", "python", "-m", "sse_embedding.mcp_server"]
    }
  }
}

Windows:

{
  "mcpServers": {
    "sse-embedding": {
      "command": "uv",
      "args": ["run", "--directory", "C:\\path\\to\\Fast-Embedding-MCP-SSE", "python", "-m", "sse_embedding.mcp_server"]
    }
  }
}

If Claude Desktop reports that uv was not found, replace "command": "uv" with the absolute path to the uv executable (which uv on macOS/Linux, (Get-Command uv).Source in PowerShell), or point command directly at the .venv interpreter that uv sync created (/path/to/Fast-Embedding-MCP-SSE/.venv/bin/python, or on Windows C:\\path\\to\\Fast-Embedding-MCP-SSE\\.venv\\Scripts\\python.exe) with "args": ["-m", "sse_embedding.mcp_server"].

Matryoshka dimensions

Valid dim / dimensions values are 512, 256, 128, 64, 32. Smaller dimensions are faster and smaller with graceful quality degradation. Truncation is applied to the full 512D vector and the result is renormalized, so cosine similarity stays valid at any level.

License

Apache-2.0

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured