mcp-recall
A self-hosted MCP memory server that gives AI assistants persistent, semantic memory by storing facts as vector embeddings locally, supporting semantic search and swappable embedding models.
README
mcp-recall
A self-hosted MCP memory server that gives AI assistants persistent, semantic memory. Store facts, search by meaning, swap embedding models — all running locally on your own hardware.
No cloud APIs. No GPU required. No API costs.
You: "What do we know about the backup configuration?"
Claude: memory_search → finds relevant memories in milliseconds
What it does
mcp-recall stores memories as vector embeddings in PostgreSQL and makes them searchable via the Model Context Protocol. Your AI assistant can remember things across sessions, projects, and devices.
| Feature | Details |
|---|---|
| Semantic search | Find memories by meaning, not keywords |
| Swappable models | Change embedding models without losing data |
| Built-in benchmarking | Compare models against your actual data |
| Dual transport | Streamable HTTP + SSE (legacy) |
| Local embeddings | ONNX models run in-process, no external calls |
| Low resource | Runs on a Celeron J1900 with 16 GB RAM |
Quick start
1. Clone and configure
git clone https://github.com/Jensimogit/mcp-recall.git
cd mcp-recall
npm install
cp .env.example .env
# Generate a random database password (you'll never need to type it)
echo "POSTGRES_PASSWORD=$(openssl rand -base64 32)" >> .env
2. Download an embedding model
Models are not included in the repository (they're 170–560 MB). Download one:
node scripts/download-model.js multilingual-e5-large
Verify the model files are in place:
ls models/multilingual-e5-large/
# Expected: config.json onnx/ tokenizer.json tokenizer_config.json
If the directory is empty (rare, depends on cache layout), copy manually:
find node_modules/@xenova/transformers/.cache -name "config.json"
# Copy the directory that contains config.json + tokenizer.json:
cp -r node_modules/@xenova/transformers/.cache/Xenova/multilingual-e5-large/* models/multilingual-e5-large/
3. Start the server
docker compose up -d
That's it. The server starts, runs the database migration, loads the embedding model, and listens on port 3000.
# Verify it's running
curl http://localhost:3000/health
# {"status":"ok","version":"0.2.0","model":"multilingual-e5-large","memories":0,"sessions":0}
4. Seed example memories (optional)
Load some example memories to verify search works and to run benchmarks:
docker compose run --rm mcp-recall node scripts/seed-examples.js
This stores 10 memories about mcp-recall itself. You can search them immediately:
# Quick test via the health endpoint — should show memories: 10
curl http://localhost:3000/health
5. Connect your AI assistant
Claude Code:
claude mcp add -s user --transport http mcp-recall http://localhost:3000/mcp
Claude Code (SSE transport):
claude mcp add -s user --transport sse mcp-recall http://localhost:3000/sse
Other MCP clients — point them to http://localhost:3000/mcp (Streamable HTTP) or http://localhost:3000/sse (SSE).
6. Verify it works
Start a new Claude Code session and try the tools:
$ claude
❯ Use memory_stats to check the database
● mcp-recall - memory_stats (MCP)
⎿ Total memories: 10
Unique tags: 14
● The database has 10 memories with 14 unique tags.
❯ Search memories for "backup"
● mcp-recall - memory_search (MCP)(query: "backup")
⎿ [1] (79.1% match) Database backup: docker exec mcp-recall-db pg_dump -U mcp
mcp_recall > backup.sql. Restore: cat backup.sql | docker exec -i
mcp-recall-db psql -U mcp mcp_recall.
Tags: operations, backup
❯ Store a new memory: "The deploy key is in 1Password under 'production-deploy'"
● mcp-recall - memory_store (MCP)(content: "The deploy key is in 1Password under
'production-deploy'", tags: ["deployment","credentials"])
⎿ Stored memory 33c6f4e8-f0bc-435a-bb46-fd83676698dd:
The deploy key is in 1Password under 'production-deploy'
❯ Search for "deploy credentials"
● mcp-recall - memory_search (MCP)(query: "deploy credentials")
⎿ [1] (85.0% match) The deploy key is in 1Password under 'production-deploy'
Tags: deployment, credentials
Note how "deploy credentials" matches "deploy key in 1Password" with 85% similarity — that's semantic search in action, matching meaning rather than keywords.
Architecture
┌─────────────────────────────────────────────┐
│ MCP Client (Claude Code, claude.ai, etc.) │
└─────────────────┬───────────────────────────┘
│ HTTP (Streamable HTTP or SSE)
▼
┌─────────────────────────────────────────────┐
│ mcp-recall-server (Node.js 22) │
│ ├── MCP Protocol (6 tools) │
│ ├── Express HTTP │
│ └── @xenova/transformers (ONNX, local) │
│ └── Embedding model (volume mount) │
└─────────────────┬───────────────────────────┘
│
┌─────────────────▼───────────────────────────┐
│ PostgreSQL 16 + pgvector │
│ └── HNSW index (cosine similarity) │
└─────────────────────────────────────────────┘
All components run in Docker. The embedding model runs directly in the Node.js process using ONNX Runtime — no Ollama, no Python, no separate inference server.
MCP Tools
Your AI assistant gets these tools:
| Tool | Description |
|---|---|
memory_store |
Store a new memory (auto-generates embedding) |
memory_search |
Search by semantic similarity |
memory_update |
Update content, tags, or metadata (re-embeds if content changes) |
memory_delete |
Delete a memory by ID |
memory_list |
List memories, optionally filtered by tags |
memory_stats |
Show database statistics |
Embedding models
Recommended: multilingual-e5-large (1024d)
This is the default and recommended model. It's trained specifically for information retrieval (short query → long text), which is exactly how memory search works.
Available models
| Model | Dimensions | Size (quantized) | Best for |
|---|---|---|---|
multilingual-e5-large |
1024 | ~553 MB | General use (recommended) |
bge-m3 |
1024 | ~560 MB | Multi-granular retrieval |
all-MiniLM-L6-v2 |
384 | ~22 MB | Minimal resources, English-only |
You can use any ONNX model compatible with @xenova/transformers. Just place it in models/<name>/ with the standard HuggingFace file structure.
Benchmark results
We tested three models against 201 real memories with 8 search queries:
| Model | Correct top-1 | Avg similarity | Speed |
|---|---|---|---|
| multilingual-e5-large | 8/8 (100%) | 85.0% | 0.1/s* |
| bge-m3 | 8/8 (100%) | 61.3% | 0.1/s* |
| cross-en-de-roberta | 2/8 (25%) | 35.3% | 0.5/s* |
* Embedding speed on Intel Celeron J1900. Much faster on modern CPUs.
Key finding: Models trained for information retrieval (e5, bge) dramatically outperform sentence-similarity models (roberta) for memory search, regardless of language specialization.
Switching models
# Compare models against your data (read-only, no changes)
docker compose run --rm mcp-recall node scripts/benchmark-models.js multilingual-e5-large
# Switch to a different model (migrates DB, re-embeds everything)
docker compose run --rm mcp-recall node scripts/switch-model.js bge-m3
# Restart the server to use the new model
docker compose restart mcp-recall
The switch script handles everything:
- Detects dimension changes and migrates the database
- Re-embeds all memories with the new model
- Updates the
.envfile - Verifies the result
Your text data is never lost. Only the vector embeddings are regenerated. Content, tags, and metadata remain untouched in PostgreSQL.
Configuration
Environment variables
| Variable | Default | Description |
|---|---|---|
POSTGRES_PASSWORD |
(required) | Database password |
EMBEDDINGS_MODEL |
multilingual-e5-large |
Model directory name in ./models/ |
MCP_PORT |
3000 |
Server port |
TRUST_PROXY |
0 |
Proxy trust level (set to 1 behind nginx/Caddy) |
MCP_API_KEY |
(none) | Static Bearer token for CLI clients (Claude Code) |
MCP_AUTH_PIN |
(none) | PIN for OAuth 2.1 consent flow (claude.ai, mobile) |
MCP_BASE_URL |
(none) | Public URL of the server (required for OAuth) |
Authentication
mcp-recall supports two optional authentication methods. If neither is configured, all requests are allowed (suitable for local-only use).
API key (for Claude Code and other CLI clients):
# Generate a key and add to .env
echo "MCP_API_KEY=$(openssl rand -base64 32)" >> .env
# Configure Claude Code with the key
claude mcp add -s user --transport http \
--header "Authorization: Bearer YOUR_API_KEY" \
mcp-recall http://localhost:3000/mcp
OAuth 2.1 with PIN (for claude.ai, mobile clients):
# Add to .env
MCP_AUTH_PIN=123456 # choose a secure PIN
MCP_BASE_URL=https://your-server.example.com # public URL
When a web client connects, it's redirected to a PIN entry page. After entering the correct PIN, the client receives an OAuth token (valid 24h, refresh 30 days). Failed PIN attempts are rate-limited with increasing delays.
Behind a reverse proxy
If you run mcp-recall behind a reverse proxy (nginx, Caddy, Traefik):
- Set
TRUST_PROXY=1in.env - Proxy to
http://localhost:3000 - For Streamable HTTP: proxy
POST/GET/DELETE /mcp - For SSE: proxy
GET /sseandPOST /messages
Resource usage
Measured on an Intel Celeron J1900 (4 cores @ 2.0 GHz) with 16 GB RAM:
| Component | RAM | CPU (idle) | Disk |
|---|---|---|---|
| mcp-recall-server | ~1.1 GB | 0% | ~50 MB (image) |
| PostgreSQL + pgvector | ~26 MB | 0% | ~20 MB (200 memories) |
| Total | ~1.1 GB | 0% | - |
| Embedding model (on disk) | - | - | 553 MB (e5-large) |
- Memory usage is dominated by the ONNX model loaded into RAM
- CPU spikes only during embedding generation (~100–200ms per query)
- Re-embedding 200 memories takes ~30 minutes on the Celeron, much less on modern hardware
Project structure
mcp-recall/
├── compose.yml # Docker Compose (2 services)
├── Dockerfile # Server image (node:22-slim)
├── .env.example # Configuration template
├── package.json # 5 dependencies
├── models/ # Embedding models (git-ignored, volume-mounted)
│ └── multilingual-e5-large/
│ ├── config.json
│ ├── tokenizer.json
│ ├── tokenizer_config.json
│ └── onnx/model_quantized.onnx
├── migrations/
│ └── 001_init.sql # Schema: memories table + HNSW index
├── scripts/
│ ├── switch-model.js # Switch models with DB migration + re-embedding
│ ├── benchmark-models.js # A/B compare models against your data
│ ├── download-model.js # Download models from Hugging Face
│ └── seed-examples.js # Load example memories for testing
└── src/
├── index.js # MCP server, Express, dual transport (308 lines)
├── database.js # PostgreSQL CRUD operations (158 lines)
├── embeddings.js # Model-agnostic embedding engine (42 lines)
└── migrate.js # Standalone migration runner (16 lines)
~970 lines of code total. No framework overhead, no unnecessary abstractions.
Database
The schema is simple — one table:
CREATE TABLE memories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
metadata JSONB DEFAULT '{}',
tags TEXT[] DEFAULT '{}',
embedding vector(1024) NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
Indexes: HNSW (cosine similarity on embeddings), GIN (tag filtering), B-tree (created_at).
Backup
# Dump the database
docker exec mcp-recall-db pg_dump -U mcp mcp_recall > backup.sql
# Restore
cat backup.sql | docker exec -i mcp-recall-db psql -U mcp mcp_recall
FAQ
Q: Do I need a GPU? No. The embedding model runs on CPU via ONNX Runtime. It works fine on low-power hardware like a Celeron J1900. Embedding generation takes ~100–200ms per query — imperceptible during normal use.
Q: How many memories can it handle? The HNSW index works efficiently up to tens of thousands of entries. At that scale, consider IVFFlat indexing instead.
Q: Can I use it with ChatGPT / other LLMs? Yes — any MCP-compatible client works. The server implements the standard Model Context Protocol.
Q: What happens if I switch models?
Your text data (content, tags, metadata) is preserved. Only the vector embeddings are regenerated. The switch-model.js script handles the entire process, including database dimension changes.
Q: Is my data sent anywhere? No. Embeddings are generated locally. The server has no outbound connections. Your data stays on your hardware. However, when an MCP client retrieves memories, the content flows to whatever LLM provider the client uses.
Dependencies and licenses
| Package | License | Purpose |
|---|---|---|
| @modelcontextprotocol/sdk | MIT | MCP protocol implementation |
| @xenova/transformers | Apache-2.0 | ONNX Runtime for embeddings |
| express | MIT | HTTP server |
| pg | MIT | PostgreSQL client |
| zod | MIT | Schema validation |
| pgvector | PostgreSQL License | Vector similarity search |
All dependencies are permissively licensed (MIT or Apache-2.0).
Contributing
Contributions are welcome! This project values simplicity — please keep changes focused and minimal.
License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.