recall-memory-mcp

recall-memory-mcp

Enables AI agents to store, retrieve, and self-improve procedural memories (lessons learned) based on relevance to the current task, pruning unused memories to reduce context load and prevent repetition of past mistakes.

Category
Visit Server

README

recall-memory-mcp

A relevance-gated, self-improving procedural memory for AI agents, as an MCP server.

Most agent-memory tools remember facts (conversations, preferences). This one stores the lessons an agent learns, surfaces only the ones relevant to the task at hand, and gets better over time by learning from failures, deduping, and pruning what it never uses. So the agent stops dumping its whole history into context, and it stops repeating its own mistakes.

Why

A long-running agent accretes memory and usually loads all of it every session. That is expensive, slow, and it drowns the current truth in stale history, so the agent drifts back to old, superseded decisions.

Measured on a real 64-day production agent: ~91,000 tokens were loaded every session, and ~90% of it was never used. Relevance-gating cut that to a few hundred tokens per task (about a 99% reduction), and the drift stopped, because stale history only surfaces when a task is actually about it.

The full lifecycle (eight tools)

  • recall(task, k) -- only the lessons and state relevant to what you are about to do, each with an actionable check. Self-tracks which lessons get used.
  • preflight(task, k) -- a pre-action checklist: the specific things to verify before editing a file, sending a message, deploying, or querying a database. Built for a PreToolUse hook so the right guardrails fire automatically, with no prose to re-read.
  • learn(title, body, check) -- turn a failure or insight into a retrievable lesson. Closes the loop: next time the same situation comes up, recall surfaces it. Dedupes -- a recurring failure bumps a seen_count instead of cloning the lesson.
  • memory_audit() -- how much loaded memory is never used (archive candidates) and how much is stale.
  • prune() -- retire learned lessons safely: only those never retrieved and not recurring and older than a grace period, so fresh and recurring lessons are never lost.
  • consolidate() -- flag near-duplicate lessons to merge.
  • maintain() -- one self-maintenance pass: audit + safe prune + consolidate report. Safe to run on a schedule or at session wrap.
  • reindex() -- rebuild after the memory files change.

How it works

  • Chunks the agent's markdown memory (rules whole; state, session log, and index at paragraph level), its .claude/skills, and its runtime-learned lessons.
  • Ranks with BM25 (length-normalised, so big stale blocks do not dominate), with source-weighting (real lessons beat index pointers), recency (current decisions beat superseded ones), and a generic-term down-weight (words like "task" or "file" stop inflating noise).
  • preflight adds a concept-overlap gate: a guardrail only fires if the task shares at least two distinctive (non-generic) terms with it, so a single coincidental word never trips a false checklist. This is corpus-size independent.
  • Optional semantic/hybrid retrieval (semantic.py, model2vec static embeddings, CPU, no torch) blends cosine similarity with BM25 so a differently-worded task still finds the right lesson. It degrades gracefully to pure BM25 if the dependency is absent.
  • An on-disk index cache keyed on source-file mtimes keeps preflight fast on the hot path (it runs before every risky tool call); a stale or truncated cache simply fails validation and rebuilds, so it can never serve wrong results.

Performance

On the production agent, with preflight wired into a PreToolUse hook (fires before every file edit / deploy / risky shell command):

  • Warm hook latency ~80 ms (down from ~330 ms) -- index cache + skipping the embedding import in fast mode.
  • In-process preflight lookup ~0.5 ms.
  • Per-task context ~99% smaller than loading all memory.

Tested

A committed test suite covers retrieval precision (the right guardrail fires; benign and irrelevant actions stay silent), latency, hook robustness against malformed and hostile input, and the full auto-learn loop end-to-end (a failure is detected, distilled into a lesson, and becomes retrievable). A second, self-contained smoke test builds a tiny fake agent repo in a tempdir and exercises the whole lifecycle with no external data:

python3 tests/test_smoke.py

Install and use

pip install mcp
RECALL_MEMORY_ROOT=/path/to/your/agent/repo python mcp_server.py   # as a stdio MCP server

By default it indexes .claude/rules/anti-paperclip.md, memory/state.md, memory/session-log.md, and memory/INDEX.md under RECALL_MEMORY_ROOT, plus the .claude/skills it finds and a learned.json it maintains.

To use your own layout, drop a recall.sources.json in RECALL_MEMORY_ROOT (any key you omit falls back to the default):

{
  "rules": [{"path": "docs/rules.md", "split": "\\n###\\s+Rule\\s+"}],
  "paragraphs": [
    {"path": "docs/state.md", "label": "state", "split": "\\n##\\s+"}
  ],
  "skills_dir": ".claude/skills",
  "weights": {"rule": 1.2, "state": 1.0}
}

rules files are chunked whole per section (each carries an extracted check); paragraphs files are chunked per paragraph with a source label and weight. Set skills_dir to null to skip skill indexing.

CLI without the MCP runtime:

RECALL_MEMORY_ROOT=/path/to/repo python recall.py "about to publish a repo"
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --preflight "edit the server entrypoint"
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --learn "Title" "What happened" "What to check next time"
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --maintain
RECALL_MEMORY_ROOT=/path/to/repo python recall.py --audit

Tuning

Env var Default Meaning
RECALL_MEMORY_ROOT . Root of the agent repo to index.
RECALL_LEARNED_PATH <root>/harness-memory/learned.json Where runtime lessons are stored.
RECALL_FAST unset Skip the embedding import (pure BM25); used on the hook hot path.
RECALL_PREFLIGHT_FLOOR 0 Minimum score for a preflight check (the overlap gate does the real filtering).
RECALL_OVERLAP_MIN 2 Distinctive terms a task must share with a guardrail before it fires.

Status and roadmap

v0.3: retrieve / preflight / learn (with dedup) / audit / safe-prune / consolidate / maintain lifecycle, plus optional semantic retrieval, an index cache, env-tunable precision, and tests. It does the thing the fact-memory tools (Mem0, Zep, Letta, Cognee) do not: procedural, relevance-gated, self-pruning memory of how to do the work, that learns from its own failures.

Ahead: auto-firing learn from failure signals; generating evals from failures; behavioural model-diffing on new model releases; and federation.

License

MIT.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured