recall-mcp

recall-mcp

Turns a local folder of notes and documents into a searchable knowledge base for AI assistants via MCP, enabling semantic search, reading, and adding notes entirely on-device.

Category
Visit Server

README

🧠 Recall β€” a local, private knowledge-base MCP server

CI Python MCP License: MIT

Recall turns a folder of your own notes and documents into a searchable knowledge base that any AI assistant can use. It is a Model Context Protocol (MCP) server: connect it to Claude Desktop, Claude Code, or any MCP client, and the assistant can search, read, and add to your notes through well-defined tools.

It uses semantic search powered by local embeddings, so it finds passages by meaning, not just matching keywords β€” and it runs entirely on your machine. No API key, no cloud, your documents never leave your device.


Why this project is interesting

  • Retrieval-Augmented Generation (RAG) done locally β€” chunking, embeddings, and cosine-similarity retrieval, the core of modern AI knowledge systems.
  • Hybrid retrieval β€” fuses semantic and keyword results with Reciprocal Rank Fusion (RRF), the technique production search systems use.
  • Model Context Protocol β€” exposes capabilities as tools an LLM can call, the emerging standard for connecting AI assistants to real systems.
  • Privacy-first β€” semantic search runs on-device with a small embedding model; nothing is sent to a third party.
  • Graceful degradation β€” if the embedding model can't load, it automatically falls back to keyword search instead of breaking.

See it in action

<!-- TODO: capture a screenshot of Claude Desktop calling a Recall tool, save it as docs/demo.png, then uncomment the line below. --> <!-- Claude Desktop calling Recall's search_documents tool and answering from the notes -->

Ask Claude (with Recall connected) "search my notes for how to undo a git commit" β€” it calls the search_documents tool and answers grounded in git-cheatsheet.md, entirely on your machine.

See the difference: keyword vs. semantic

Ask "how do I undo a commit?" against a small dev knowledge base:

Search mode Top result Why
Keyword the doc that literally contains the words "undo a commit" matches exact words
Semantic git-cheatsheet.md β†’ git revert makes a new commit that undoes an earlier one matches meaning

Semantic search finds the genuinely useful answer even though the words don't overlap. That is the whole point of embeddings.

Retrieval quality (measured)

A small labelled eval (10 paraphrased queries over the sample docs) compares the three search modes. Semantic beats keyword clearly, especially at recall@1:

Mode recall@1 recall@3
Keyword 40% 80%
Semantic 80% 90%
Hybrid 60% 90%

Reproduce it with python eval/run_eval.py. The corpus is small and topically overlapping, so treat the numbers as illustrative. (Pure semantic edges out hybrid here; hybrid tends to win when exact keyword matches matter β€” codes, names, error strings.) The harness is the real point: retrieval quality is measured, not assumed.

What the AI can do (the MCP tools)

Tool What it does
search_documents(query, limit, mode) Find the most relevant passages. mode can be auto, semantic, keyword, or hybrid.
get_document(source) Return the full text of one document so the assistant can read or summarise it.
list_sources() List the documents currently loaded and the active search mode.
add_note(title, content) Save a new note into the knowledge base; it becomes searchable immediately.

How it works

        Your documents (.md / .txt / .pdf)
                 β”‚
                 β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  DocumentStore     β”‚   1. split each file into paragraph "chunks"
        β”‚  (recall/store.py) β”‚   2. embed every chunk into a vector (local model)
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚  query
                 β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  Semantic search   β”‚   embed the query, rank chunks by cosine similarity
        β”‚  (or keyword)      β”‚   (falls back to keyword search if no model)
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚  tools
                 β–Ό
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        MCP (stdio / JSON-RPC)
        β”‚  FastMCP server    β”‚ ◀───────────────────────────▢  Claude Desktop,
        β”‚  (recall/server.py)β”‚                                 Claude Code, ...
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  1. Chunk β€” documents (Markdown, plain text, or PDF) are split on blank lines into passages, with each Markdown heading kept attached to the text it introduces, so results land on a precise, self-contained passage.
  2. Embed β€” each chunk is turned into a vector with a local fastembed model (bge-small-en-v1.5, 384-dimensional vectors).
  3. Retrieve β€” a query is embedded and compared to every chunk by cosine similarity; the closest chunks win.
  4. Serve β€” the FastMCP server exposes search/read/write as MCP tools over stdio, so any MCP client can use them.

Quickstart

Requires Python 3.10+.

# 1. Clone and enter the project
git clone https://github.com/jaswanthsurya007-source/recall-mcp.git
cd recall-mcp

# 2. Create and activate a virtual environment
python -m venv .venv
# Windows (PowerShell):
.venv\Scripts\Activate.ps1
# macOS / Linux:
source .venv/bin/activate

# 3. Install
pip install -e .

# 4. Try a search from Python
python -c "from recall.store import DocumentStore; s=DocumentStore('data/documents'); print([r.chunk.source for r in s.search('how do I undo a commit', 1)])"

The first run downloads the embedding model (~66 MB) once, then caches it.

Behind a corporate proxy?

Recall uses truststore to trust your operating system's certificates automatically, so it works on networks that inspect TLS traffic (common at large companies) without extra configuration.

Connect it to Claude Desktop

Add Recall to your claude_desktop_config.json (Settings β†’ Developer β†’ Edit Config):

{
  "mcpServers": {
    "recall": {
      "command": "/absolute/path/to/recall-mcp/.venv/bin/python",
      "args": ["-m", "recall.server"],
      "env": {
        "RECALL_DOCS_DIR": "/absolute/path/to/recall-mcp/data/documents"
      }
    }
  }
}

On Windows, use the full path to python.exe and escape backslashes, e.g. "C:\\path\\to\\recall-mcp\\.venv\\Scripts\\python.exe".

Restart Claude Desktop, and you'll see Recall's tools available. Ask it things like "Search my notes for how to undo a git commit" or "Save a note titled 'Meeting' with these action items…".

Use your own documents

Point Recall at any folder of .md, .txt, or .pdf files:

# macOS / Linux: set RECALL_DOCS_DIR to your own notes folder
RECALL_DOCS_DIR="/path/to/my/notes" python -m recall.server
# Windows (PowerShell)
$env:RECALL_DOCS_DIR = "C:\path\to\my\notes"; python -m recall.server

The data/documents/ folder ships with a few sample notes so you can try it immediately.

Running the tests

pip install -e ".[dev]"
pytest -q

The test suite runs fully offline (keyword mode), so it needs no model download.

Project structure

recall-mcp/
β”œβ”€β”€ .github/workflows/ # CI: ruff + pytest on every push
β”œβ”€β”€ recall/
β”‚   β”œβ”€β”€ server.py      # FastMCP server: defines the MCP tools
β”‚   β”œβ”€β”€ store.py       # load β†’ chunk β†’ search (semantic, keyword, hybrid)
β”‚   └── embeddings.py  # local embedding model wrapper (fastembed)
β”œβ”€β”€ data/documents/    # sample knowledge base (.md and .pdf)
β”œβ”€β”€ tests/             # offline pytest suite (+ fixtures/)
β”œβ”€β”€ eval/              # retrieval-quality eval (recall@k)
β”œβ”€β”€ pyproject.toml     # packaging + tooling config
β”œβ”€β”€ requirements.txt
└── LICENSE

Design notes

  • Why local embeddings? Privacy and zero cost. fastembed uses ONNX runtime rather than PyTorch, so installs are small and inference is fast on CPU.
  • Why chunk by paragraph? It is simple and transparent, and it makes results land on a focused passage. A future version could use overlapping token windows.
  • Why a fallback to keyword search? A tool should never hard-fail. If the model can't be downloaded, search still works β€” just less cleverly.
  • Re-indexing on write is a full reload for clarity; at larger scale you would embed only the newly added chunks.

Roadmap

  • [x] Retrieval-quality eval harness (recall@k)
  • [x] Hybrid search (Reciprocal Rank Fusion of semantic + keyword)
  • [x] PDF document support
  • [ ] Persist embeddings to disk so startup is instant on large corpora
  • [ ] Support HTML documents
  • [ ] Optional LLM-generated summaries via the Claude API
  • [ ] Expose documents as MCP resources, not just tools

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured