recall-mcp
Turns a local folder of notes and documents into a searchable knowledge base for AI assistants via MCP, enabling semantic search, reading, and adding notes entirely on-device.
README
π§ Recall β a local, private knowledge-base MCP server
Recall turns a folder of your own notes and documents into a searchable knowledge base that any AI assistant can use. It is a Model Context Protocol (MCP) server: connect it to Claude Desktop, Claude Code, or any MCP client, and the assistant can search, read, and add to your notes through well-defined tools.
It uses semantic search powered by local embeddings, so it finds passages by meaning, not just matching keywords β and it runs entirely on your machine. No API key, no cloud, your documents never leave your device.
Why this project is interesting
- Retrieval-Augmented Generation (RAG) done locally β chunking, embeddings, and cosine-similarity retrieval, the core of modern AI knowledge systems.
- Hybrid retrieval β fuses semantic and keyword results with Reciprocal Rank Fusion (RRF), the technique production search systems use.
- Model Context Protocol β exposes capabilities as tools an LLM can call, the emerging standard for connecting AI assistants to real systems.
- Privacy-first β semantic search runs on-device with a small embedding model; nothing is sent to a third party.
- Graceful degradation β if the embedding model can't load, it automatically falls back to keyword search instead of breaking.
See it in action
<!-- TODO: capture a screenshot of Claude Desktop calling a Recall tool, save it
as docs/demo.png, then uncomment the line below. -->
<!--
-->
Ask Claude (with Recall connected) "search my notes for how to undo a git
commit" β it calls the search_documents tool and answers grounded in
git-cheatsheet.md, entirely on your machine.
See the difference: keyword vs. semantic
Ask "how do I undo a commit?" against a small dev knowledge base:
| Search mode | Top result | Why |
|---|---|---|
| Keyword | the doc that literally contains the words "undo a commit" | matches exact words |
| Semantic | git-cheatsheet.md β git revert makes a new commit that undoes an earlier one |
matches meaning |
Semantic search finds the genuinely useful answer even though the words don't overlap. That is the whole point of embeddings.
Retrieval quality (measured)
A small labelled eval (10 paraphrased queries over the sample docs) compares the three search modes. Semantic beats keyword clearly, especially at recall@1:
| Mode | recall@1 | recall@3 |
|---|---|---|
| Keyword | 40% | 80% |
| Semantic | 80% | 90% |
| Hybrid | 60% | 90% |
Reproduce it with python eval/run_eval.py. The corpus is small and topically
overlapping, so treat the numbers as illustrative. (Pure semantic edges out
hybrid here; hybrid tends to win when exact keyword matches matter β codes,
names, error strings.) The harness is the real point: retrieval quality is
measured, not assumed.
What the AI can do (the MCP tools)
| Tool | What it does |
|---|---|
search_documents(query, limit, mode) |
Find the most relevant passages. mode can be auto, semantic, keyword, or hybrid. |
get_document(source) |
Return the full text of one document so the assistant can read or summarise it. |
list_sources() |
List the documents currently loaded and the active search mode. |
add_note(title, content) |
Save a new note into the knowledge base; it becomes searchable immediately. |
How it works
Your documents (.md / .txt / .pdf)
β
βΌ
βββββββββββββββββββββ
β DocumentStore β 1. split each file into paragraph "chunks"
β (recall/store.py) β 2. embed every chunk into a vector (local model)
βββββββββββββββββββββ
β query
βΌ
βββββββββββββββββββββ
β Semantic search β embed the query, rank chunks by cosine similarity
β (or keyword) β (falls back to keyword search if no model)
βββββββββββββββββββββ
β tools
βΌ
βββββββββββββββββββββ MCP (stdio / JSON-RPC)
β FastMCP server β βββββββββββββββββββββββββββββΆ Claude Desktop,
β (recall/server.py)β Claude Code, ...
βββββββββββββββββββββ
- Chunk β documents (Markdown, plain text, or PDF) are split on blank lines into passages, with each Markdown heading kept attached to the text it introduces, so results land on a precise, self-contained passage.
- Embed β each chunk is turned into a vector with a local
fastembedmodel (bge-small-en-v1.5, 384-dimensional vectors). - Retrieve β a query is embedded and compared to every chunk by cosine similarity; the closest chunks win.
- Serve β the FastMCP server exposes search/read/write as MCP tools over stdio, so any MCP client can use them.
Quickstart
Requires Python 3.10+.
# 1. Clone and enter the project
git clone https://github.com/jaswanthsurya007-source/recall-mcp.git
cd recall-mcp
# 2. Create and activate a virtual environment
python -m venv .venv
# Windows (PowerShell):
.venv\Scripts\Activate.ps1
# macOS / Linux:
source .venv/bin/activate
# 3. Install
pip install -e .
# 4. Try a search from Python
python -c "from recall.store import DocumentStore; s=DocumentStore('data/documents'); print([r.chunk.source for r in s.search('how do I undo a commit', 1)])"
The first run downloads the embedding model (~66 MB) once, then caches it.
Behind a corporate proxy?
Recall uses truststore to trust
your operating system's certificates automatically, so it works on networks that
inspect TLS traffic (common at large companies) without extra configuration.
Connect it to Claude Desktop
Add Recall to your claude_desktop_config.json
(Settings β Developer β Edit Config):
{
"mcpServers": {
"recall": {
"command": "/absolute/path/to/recall-mcp/.venv/bin/python",
"args": ["-m", "recall.server"],
"env": {
"RECALL_DOCS_DIR": "/absolute/path/to/recall-mcp/data/documents"
}
}
}
}
On Windows, use the full path to python.exe and escape backslashes, e.g.
"C:\\path\\to\\recall-mcp\\.venv\\Scripts\\python.exe".
Restart Claude Desktop, and you'll see Recall's tools available. Ask it things like "Search my notes for how to undo a git commit" or "Save a note titled 'Meeting' with these action itemsβ¦".
Use your own documents
Point Recall at any folder of .md, .txt, or .pdf files:
# macOS / Linux: set RECALL_DOCS_DIR to your own notes folder
RECALL_DOCS_DIR="/path/to/my/notes" python -m recall.server
# Windows (PowerShell)
$env:RECALL_DOCS_DIR = "C:\path\to\my\notes"; python -m recall.server
The data/documents/ folder ships with a few sample notes so you can try it
immediately.
Running the tests
pip install -e ".[dev]"
pytest -q
The test suite runs fully offline (keyword mode), so it needs no model download.
Project structure
recall-mcp/
βββ .github/workflows/ # CI: ruff + pytest on every push
βββ recall/
β βββ server.py # FastMCP server: defines the MCP tools
β βββ store.py # load β chunk β search (semantic, keyword, hybrid)
β βββ embeddings.py # local embedding model wrapper (fastembed)
βββ data/documents/ # sample knowledge base (.md and .pdf)
βββ tests/ # offline pytest suite (+ fixtures/)
βββ eval/ # retrieval-quality eval (recall@k)
βββ pyproject.toml # packaging + tooling config
βββ requirements.txt
βββ LICENSE
Design notes
- Why local embeddings? Privacy and zero cost.
fastembeduses ONNX runtime rather than PyTorch, so installs are small and inference is fast on CPU. - Why chunk by paragraph? It is simple and transparent, and it makes results land on a focused passage. A future version could use overlapping token windows.
- Why a fallback to keyword search? A tool should never hard-fail. If the model can't be downloaded, search still works β just less cleverly.
- Re-indexing on write is a full reload for clarity; at larger scale you would embed only the newly added chunks.
Roadmap
- [x] Retrieval-quality eval harness (recall@k)
- [x] Hybrid search (Reciprocal Rank Fusion of semantic + keyword)
- [x] PDF document support
- [ ] Persist embeddings to disk so startup is instant on large corpora
- [ ] Support HTML documents
- [ ] Optional LLM-generated summaries via the Claude API
- [ ] Expose documents as MCP resources, not just tools
License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.