recall-mcp

recall-mcp

A shared, local-first memory layer for AI CLIs, providing persistent, layered memory across Claude Code, Gemini CLI, and other MCP-aware clients.

Category
Visit Server

README

recall-mcp

One shared, layered, local-first brain for every AI CLI you use. Claude Code, Gemini CLI, Cursor, Continue, Zed — they all forget. recall-mcp is the memory they share.

License: MIT Python 3.10+ MCP Local-first

<p align="center"> <img src="docs/demo.gif" alt="Gemini CLI calling recall-mcp's memory_recall tool to answer 'what did we ship today and why isn't it called brain-mcp?' — surfacing the rename decision and shipped-today architecture from the layered memory store" width="820"> <br> <em>Gemini CLI recalling today's decisions from a brain it shares with Claude Code and Hermes.</em> </p>

Built on the layered memory engine from Hermes Agent by Nous Research (MIT). recall-mcp packages that engine as a standalone MCP server so any AI client — not just Hermes — can plug into the same brain. Original architecture: theirs. Packaging, MCP surface, cross-CLI integration: this project. See Credits.

Quick start

# Install
pipx install recall-mcp

# Wire it into Claude Code (one-time)
echo '{"mcpServers":{"recall-mcp":{"type":"stdio","command":"recall-mcp"}}}' >> ~/.claude.json

# Restart Claude Code. Done.

That's it. Every conversation now writes to and reads from the same persistent brain — and so do Gemini CLI, Cursor, and any other MCP-aware client you wire up the same way.

What it does

flowchart TD
    A[Claude Code] -- MCP --> M[recall-mcp]
    B[Gemini CLI] -- MCP --> M
    C[Cursor / Continue / Zed] -- MCP --> M
    M --> S[(SQLite<br/>facts)]
    M --> V[(ChromaDB<br/>vectors)]
    M --> E[(Entity<br/>graph)]
    M --> T[(Temporal<br/>lineage)]
    M --> F[(FTS5<br/>keyword)]
    classDef client fill:#1f6feb,stroke:#1f6feb,color:#fff,stroke-width:0
    classDef brain fill:#a371f7,stroke:#a371f7,color:#fff,stroke-width:0
    classDef store fill:#0d1117,stroke:#30363d,color:#7d8590
    class A,B,C client
    class M brain
    class S,V,E,T,F store

Every AI CLI has the same blind spot: each new session starts with amnesia. Native save_memory tools store flat lists that bloat the system prompt over time. Cloud memory services need accounts, paid tiers, and trust your data to a vendor.

recall-mcp gives you one brain shared by every MCP-aware AI client:

  • 🧠 7 memory layers — vector similarity, BM25 keyword, entity graph, temporal lineage, importance scoring, forgetting engine, hybrid retrieval
  • 🔌 Drop-in via MCP — works with Claude Code, Gemini CLI, Cursor, Continue, Zed, any client speaking Model Context Protocol
  • 🏠 Local-first — SQLite + ChromaDB on your machine. No accounts, no Docker, no cloud lock-in
  • 🔄 Brain-swappable — switch between Claude, Gemini, MiniMax, Qwen — they all share the same memory
  • 🛡️ Graceful degradation — when embeddings hit rate limits, BM25 + entity + temporal carry the load. Never poisons the index

Install

pipx install recall-mcp

Or with uv:

uv tool install recall-mcp

Or from source:

git clone https://github.com/Dhari-Q/recall-mcp
cd recall-mcp
pip install -e .

Configure your AI client

Claude Code

Add to ~/.claude.json under your project's mcpServers:

{
  "mcpServers": {
    "recall-mcp": {
      "type": "stdio",
      "command": "recall-mcp"
    }
  }
}

Gemini CLI

Add to ~/.gemini/settings.json:

{
  "mcpServers": {
    "recall-mcp": {
      "command": "recall-mcp",
      "trust": true
    }
  }
}

Cursor

Add to ~/.cursor/mcp.json:

{
  "mcpServers": {
    "recall-mcp": {
      "command": "recall-mcp"
    }
  }
}

Restart your client. Done.

Five tools you'll use

Tool Purpose
memory_recall(query, top_k) Hybrid search across all layers — vector + BM25 + entity + temporal
memory_remember(content, type, confidence, tags) Store a fact, decision, preference, or gotcha
memory_recent_sessions(limit) List recent session summaries with decisions and bug fixes
memory_search_entity(name, limit) Find memories tied to a specific file, project, person, or tool
memory_stats() Sanity-check counts across every layer

Optional: real semantic search

By default, recall-mcp ships with BM25 keyword + entity graph + temporal retrieval — those work without any API key.

To enable vector / semantic search (queries like "how do I swap the AI" finding "switchable via /model" without shared keywords), point recall-mcp at an embeddings provider:

Create ~/.recall-mcp/.env (or export in your shell):

# MiniMax (global) — fastest path
MINIMAX_API_KEY=sk-...

# Or OpenAI
OPENAI_API_KEY=sk-...

# Or OpenRouter
OPENROUTER_API_KEY=sk-...

Vector layer activates automatically on next start.

Optional: auto-prefetch hook for Claude Code

The MCP tools above are deliberate — the model has to call them. For silent automatic recall on every prompt (like Claude Code's native memory but layered), add a UserPromptSubmit hook. See examples/claude_code_hook.md for the recipe.

Memory types

When you ask the model to remember something, it picks one of:

Type Decay Examples
architecture Permanent "We use ChromaDB for vectors"
decision Permanent "We chose MIT over GPL"
convention Permanent "All API calls go through retry_utils"
pattern Permanent "Use with statements for sqlite connections"
gotcha Permanent "MiniMax embeddings are NOT OpenAI-compatible"
preference Permanent "User prefers terse responses"
progress 7 days "Finished MCP wiring on 2026-04-28"
context 30 days Misc. background facts

Storage location

All data lives in $RECALL_MCP_HOME (defaults to ~/.recall-mcp/):

~/.recall-mcp/
├── memory/          # SQLite — facts + entity graph + temporal lineage
├── episodic/        # SQLite — session summaries
└── chroma/          # ChromaDB — vector embeddings

Set RECALL_MCP_HOME to point multiple machines at a synced folder (e.g., Syncthing) and your AI's memory follows you.

Architecture

recall-mcp exposes seven memory layers (originally designed in Hermes Agent), each backed by a focused storage engine:

  1. Episodic (per-turn / per-session events) — SQLite
  2. Semantic (extracted facts, decisions) — SQLite + ChromaDB
  3. Entity graph (who/what/why, dependencies) — SQLite
  4. Temporal lineage (millisecond timestamps, before/after queries) — SQLite
  5. Importance scoring (not all memories equal) — derived
  6. Forgetting engine (decay + Jaccard dedup) — derived
  7. Hybrid retrieval (BM25 + vector + entity + temporal, fused with optional LLM re-rank) — runtime

When you call memory_recall, all four retrieval paths run in parallel, results are deduplicated, scored by source quality + importance, and returned ranked.

Credits

Memory architecture derived from Hermes by Nous Research (MIT). recall-mcp generalizes the layered memory + retrieval engine into a standalone MCP server that any AI client can plug into.

License

MIT — see LICENSE.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured