mcp-recall

mcp-recall

A self-hosted MCP memory server that gives AI assistants persistent, semantic memory by storing facts as vector embeddings locally, supporting semantic search and swappable embedding models.

Category
Visit Server

README

mcp-recall

A self-hosted MCP memory server that gives AI assistants persistent, semantic memory. Store facts, search by meaning, swap embedding models — all running locally on your own hardware.

No cloud APIs. No GPU required. No API costs.

You: "What do we know about the backup configuration?"
Claude: memory_search → finds relevant memories in milliseconds

What it does

mcp-recall stores memories as vector embeddings in PostgreSQL and makes them searchable via the Model Context Protocol. Your AI assistant can remember things across sessions, projects, and devices.

Feature Details
Semantic search Find memories by meaning, not keywords
Swappable models Change embedding models without losing data
Built-in benchmarking Compare models against your actual data
Dual transport Streamable HTTP + SSE (legacy)
Local embeddings ONNX models run in-process, no external calls
Low resource Runs on a Celeron J1900 with 16 GB RAM

Quick start

1. Clone and configure

git clone https://github.com/Jensimogit/mcp-recall.git
cd mcp-recall
npm install
cp .env.example .env
# Generate a random database password (you'll never need to type it)
echo "POSTGRES_PASSWORD=$(openssl rand -base64 32)" >> .env

2. Download an embedding model

Models are not included in the repository (they're 170–560 MB). Download one:

node scripts/download-model.js multilingual-e5-large

Verify the model files are in place:

ls models/multilingual-e5-large/
# Expected: config.json  onnx/  tokenizer.json  tokenizer_config.json

If the directory is empty (rare, depends on cache layout), copy manually:

find node_modules/@xenova/transformers/.cache -name "config.json"
# Copy the directory that contains config.json + tokenizer.json:
cp -r node_modules/@xenova/transformers/.cache/Xenova/multilingual-e5-large/* models/multilingual-e5-large/

3. Start the server

docker compose up -d

That's it. The server starts, runs the database migration, loads the embedding model, and listens on port 3000.

# Verify it's running
curl http://localhost:3000/health
# {"status":"ok","version":"0.2.0","model":"multilingual-e5-large","memories":0,"sessions":0}

4. Seed example memories (optional)

Load some example memories to verify search works and to run benchmarks:

docker compose run --rm mcp-recall node scripts/seed-examples.js

This stores 10 memories about mcp-recall itself. You can search them immediately:

# Quick test via the health endpoint — should show memories: 10
curl http://localhost:3000/health

5. Connect your AI assistant

Claude Code:

claude mcp add -s user --transport http mcp-recall http://localhost:3000/mcp

Claude Code (SSE transport):

claude mcp add -s user --transport sse mcp-recall http://localhost:3000/sse

Other MCP clients — point them to http://localhost:3000/mcp (Streamable HTTP) or http://localhost:3000/sse (SSE).

6. Verify it works

Start a new Claude Code session and try the tools:

$ claude

❯ Use memory_stats to check the database

● mcp-recall - memory_stats (MCP)
  ⎿  Total memories: 10
     Unique tags: 14

● The database has 10 memories with 14 unique tags.

❯ Search memories for "backup"

● mcp-recall - memory_search (MCP)(query: "backup")
  ⎿  [1] (79.1% match) Database backup: docker exec mcp-recall-db pg_dump -U mcp
     mcp_recall > backup.sql. Restore: cat backup.sql | docker exec -i
     mcp-recall-db psql -U mcp mcp_recall.
         Tags: operations, backup

❯ Store a new memory: "The deploy key is in 1Password under 'production-deploy'"

● mcp-recall - memory_store (MCP)(content: "The deploy key is in 1Password under
                                 'production-deploy'", tags: ["deployment","credentials"])
  ⎿  Stored memory 33c6f4e8-f0bc-435a-bb46-fd83676698dd:
     The deploy key is in 1Password under 'production-deploy'

❯ Search for "deploy credentials"

● mcp-recall - memory_search (MCP)(query: "deploy credentials")
  ⎿  [1] (85.0% match) The deploy key is in 1Password under 'production-deploy'
         Tags: deployment, credentials

Note how "deploy credentials" matches "deploy key in 1Password" with 85% similarity — that's semantic search in action, matching meaning rather than keywords.

Architecture

┌─────────────────────────────────────────────┐
│  MCP Client (Claude Code, claude.ai, etc.)  │
└─────────────────┬───────────────────────────┘
                  │  HTTP (Streamable HTTP or SSE)
                  ▼
┌─────────────────────────────────────────────┐
│  mcp-recall-server (Node.js 22)             │
│  ├── MCP Protocol (6 tools)                 │
│  ├── Express HTTP                           │
│  └── @xenova/transformers (ONNX, local)     │
│       └── Embedding model (volume mount)    │
└─────────────────┬───────────────────────────┘
                  │
┌─────────────────▼───────────────────────────┐
│  PostgreSQL 16 + pgvector                   │
│  └── HNSW index (cosine similarity)         │
└─────────────────────────────────────────────┘

All components run in Docker. The embedding model runs directly in the Node.js process using ONNX Runtime — no Ollama, no Python, no separate inference server.

MCP Tools

Your AI assistant gets these tools:

Tool Description
memory_store Store a new memory (auto-generates embedding)
memory_search Search by semantic similarity
memory_update Update content, tags, or metadata (re-embeds if content changes)
memory_delete Delete a memory by ID
memory_list List memories, optionally filtered by tags
memory_stats Show database statistics

Embedding models

Recommended: multilingual-e5-large (1024d)

This is the default and recommended model. It's trained specifically for information retrieval (short query → long text), which is exactly how memory search works.

Available models

Model Dimensions Size (quantized) Best for
multilingual-e5-large 1024 ~553 MB General use (recommended)
bge-m3 1024 ~560 MB Multi-granular retrieval
all-MiniLM-L6-v2 384 ~22 MB Minimal resources, English-only

You can use any ONNX model compatible with @xenova/transformers. Just place it in models/<name>/ with the standard HuggingFace file structure.

Benchmark results

We tested three models against 201 real memories with 8 search queries:

Model Correct top-1 Avg similarity Speed
multilingual-e5-large 8/8 (100%) 85.0% 0.1/s*
bge-m3 8/8 (100%) 61.3% 0.1/s*
cross-en-de-roberta 2/8 (25%) 35.3% 0.5/s*

* Embedding speed on Intel Celeron J1900. Much faster on modern CPUs.

Key finding: Models trained for information retrieval (e5, bge) dramatically outperform sentence-similarity models (roberta) for memory search, regardless of language specialization.

Switching models

# Compare models against your data (read-only, no changes)
docker compose run --rm mcp-recall node scripts/benchmark-models.js multilingual-e5-large

# Switch to a different model (migrates DB, re-embeds everything)
docker compose run --rm mcp-recall node scripts/switch-model.js bge-m3

# Restart the server to use the new model
docker compose restart mcp-recall

The switch script handles everything:

  • Detects dimension changes and migrates the database
  • Re-embeds all memories with the new model
  • Updates the .env file
  • Verifies the result

Your text data is never lost. Only the vector embeddings are regenerated. Content, tags, and metadata remain untouched in PostgreSQL.

Configuration

Environment variables

Variable Default Description
POSTGRES_PASSWORD (required) Database password
EMBEDDINGS_MODEL multilingual-e5-large Model directory name in ./models/
MCP_PORT 3000 Server port
TRUST_PROXY 0 Proxy trust level (set to 1 behind nginx/Caddy)
MCP_API_KEY (none) Static Bearer token for CLI clients (Claude Code)
MCP_AUTH_PIN (none) PIN for OAuth 2.1 consent flow (claude.ai, mobile)
MCP_BASE_URL (none) Public URL of the server (required for OAuth)

Authentication

mcp-recall supports two optional authentication methods. If neither is configured, all requests are allowed (suitable for local-only use).

API key (for Claude Code and other CLI clients):

# Generate a key and add to .env
echo "MCP_API_KEY=$(openssl rand -base64 32)" >> .env

# Configure Claude Code with the key
claude mcp add -s user --transport http \
  --header "Authorization: Bearer YOUR_API_KEY" \
  mcp-recall http://localhost:3000/mcp

OAuth 2.1 with PIN (for claude.ai, mobile clients):

# Add to .env
MCP_AUTH_PIN=123456          # choose a secure PIN
MCP_BASE_URL=https://your-server.example.com  # public URL

When a web client connects, it's redirected to a PIN entry page. After entering the correct PIN, the client receives an OAuth token (valid 24h, refresh 30 days). Failed PIN attempts are rate-limited with increasing delays.

Behind a reverse proxy

If you run mcp-recall behind a reverse proxy (nginx, Caddy, Traefik):

  1. Set TRUST_PROXY=1 in .env
  2. Proxy to http://localhost:3000
  3. For Streamable HTTP: proxy POST/GET/DELETE /mcp
  4. For SSE: proxy GET /sse and POST /messages

Resource usage

Measured on an Intel Celeron J1900 (4 cores @ 2.0 GHz) with 16 GB RAM:

Component RAM CPU (idle) Disk
mcp-recall-server ~1.1 GB 0% ~50 MB (image)
PostgreSQL + pgvector ~26 MB 0% ~20 MB (200 memories)
Total ~1.1 GB 0% -
Embedding model (on disk) - - 553 MB (e5-large)
  • Memory usage is dominated by the ONNX model loaded into RAM
  • CPU spikes only during embedding generation (~100–200ms per query)
  • Re-embedding 200 memories takes ~30 minutes on the Celeron, much less on modern hardware

Project structure

mcp-recall/
├── compose.yml             # Docker Compose (2 services)
├── Dockerfile              # Server image (node:22-slim)
├── .env.example            # Configuration template
├── package.json            # 5 dependencies
├── models/                 # Embedding models (git-ignored, volume-mounted)
│   └── multilingual-e5-large/
│       ├── config.json
│       ├── tokenizer.json
│       ├── tokenizer_config.json
│       └── onnx/model_quantized.onnx
├── migrations/
│   └── 001_init.sql        # Schema: memories table + HNSW index
├── scripts/
│   ├── switch-model.js     # Switch models with DB migration + re-embedding
│   ├── benchmark-models.js # A/B compare models against your data
│   ├── download-model.js   # Download models from Hugging Face
│   └── seed-examples.js    # Load example memories for testing
└── src/
    ├── index.js            # MCP server, Express, dual transport (308 lines)
    ├── database.js         # PostgreSQL CRUD operations (158 lines)
    ├── embeddings.js       # Model-agnostic embedding engine (42 lines)
    └── migrate.js          # Standalone migration runner (16 lines)

~970 lines of code total. No framework overhead, no unnecessary abstractions.

Database

The schema is simple — one table:

CREATE TABLE memories (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    content     TEXT NOT NULL,
    metadata    JSONB DEFAULT '{}',
    tags        TEXT[] DEFAULT '{}',
    embedding   vector(1024) NOT NULL,
    created_at  TIMESTAMPTZ DEFAULT NOW(),
    updated_at  TIMESTAMPTZ DEFAULT NOW()
);

Indexes: HNSW (cosine similarity on embeddings), GIN (tag filtering), B-tree (created_at).

Backup

# Dump the database
docker exec mcp-recall-db pg_dump -U mcp mcp_recall > backup.sql

# Restore
cat backup.sql | docker exec -i mcp-recall-db psql -U mcp mcp_recall

FAQ

Q: Do I need a GPU? No. The embedding model runs on CPU via ONNX Runtime. It works fine on low-power hardware like a Celeron J1900. Embedding generation takes ~100–200ms per query — imperceptible during normal use.

Q: How many memories can it handle? The HNSW index works efficiently up to tens of thousands of entries. At that scale, consider IVFFlat indexing instead.

Q: Can I use it with ChatGPT / other LLMs? Yes — any MCP-compatible client works. The server implements the standard Model Context Protocol.

Q: What happens if I switch models? Your text data (content, tags, metadata) is preserved. Only the vector embeddings are regenerated. The switch-model.js script handles the entire process, including database dimension changes.

Q: Is my data sent anywhere? No. Embeddings are generated locally. The server has no outbound connections. Your data stays on your hardware. However, when an MCP client retrieves memories, the content flows to whatever LLM provider the client uses.

Dependencies and licenses

Package License Purpose
@modelcontextprotocol/sdk MIT MCP protocol implementation
@xenova/transformers Apache-2.0 ONNX Runtime for embeddings
express MIT HTTP server
pg MIT PostgreSQL client
zod MIT Schema validation
pgvector PostgreSQL License Vector similarity search

All dependencies are permissively licensed (MIT or Apache-2.0).

Contributing

Contributions are welcome! This project values simplicity — please keep changes focused and minimal.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured