Recall
Provides long-term memory storage for AI assistants with semantic search, enabling persistent storage of preferences, decisions, and context with relationship tracking between memories.
README
Recall
Long-term memory system for MCP-compatible AI assistants with semantic search and relationship tracking.
Features
- Persistent Memory Storage: Store preferences, decisions, patterns, and session context
- Semantic Search: Find relevant memories using natural language queries via ChromaDB vectors
- Memory Relationships: Create edges between memories (supersedes, relates_to, caused_by, contradicts)
- Namespace Isolation: Global memories vs project-scoped memories
- Context Generation: Auto-format memories for session context injection
- Deduplication: Content-hash based duplicate detection
Installation
# Clone the repository
git clone https://github.com/yourorg/recall.git
cd recall
# Install with uv
uv sync
# Ensure Ollama is running with required models
ollama pull mxbai-embed-large # Required: embeddings for semantic search
ollama pull llama3.2 # Optional: session summarization for auto-capture hook
ollama serve
Usage
Run as MCP Server
uv run python -m recall
CLI Options
uv run python -m recall --help
Options:
--sqlite-path PATH SQLite database path (default: ~/.recall/recall.db)
--chroma-path PATH ChromaDB storage path (default: ~/.recall/chroma_db)
--collection NAME ChromaDB collection name (default: memories)
--ollama-host HOST Ollama server URL (default: http://localhost:11434)
--ollama-model MODEL Embedding model (default: mxbai-embed-large)
--ollama-timeout SECS Request timeout (default: 30)
--log-level LEVEL DEBUG, INFO, WARNING, ERROR, CRITICAL (default: INFO)
meta-mcp Configuration
Add Recall to your meta-mcp servers.json:
{
"recall": {
"command": "uv",
"args": [
"run",
"--directory",
"/path/to/recall",
"python",
"-m",
"recall"
],
"env": {
"RECALL_LOG_LEVEL": "INFO",
"RECALL_OLLAMA_HOST": "http://localhost:11434",
"RECALL_OLLAMA_MODEL": "mxbai-embed-large"
},
"description": "Long-term memory system with semantic search",
"tags": ["memory", "context", "semantic-search"]
}
}
Or for Claude Code / other MCP clients (claude.json):
{
"mcpServers": {
"recall": {
"command": "uv",
"args": [
"run",
"--directory",
"/path/to/recall",
"python",
"-m",
"recall"
],
"env": {
"RECALL_LOG_LEVEL": "INFO"
}
}
}
}
Environment Variables
| Variable | Default | Description |
|---|---|---|
RECALL_SQLITE_PATH |
~/.recall/recall.db |
SQLite database file path |
RECALL_CHROMA_PATH |
~/.recall/chroma_db |
ChromaDB persistent storage directory |
RECALL_COLLECTION_NAME |
memories |
ChromaDB collection name |
RECALL_OLLAMA_HOST |
http://localhost:11434 |
Ollama server URL |
RECALL_OLLAMA_MODEL |
mxbai-embed-large |
Embedding model name |
RECALL_OLLAMA_TIMEOUT |
30 |
Ollama request timeout in seconds |
RECALL_LOG_LEVEL |
INFO |
Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL) |
RECALL_DEFAULT_NAMESPACE |
global |
Default namespace for memories |
RECALL_DEFAULT_IMPORTANCE |
0.5 |
Default importance score (0.0-1.0) |
RECALL_DEFAULT_TOKEN_BUDGET |
4000 |
Default token budget for context |
MCP Tool Examples
memory_store_tool
Store a new memory with semantic indexing. Uses fast daemon path when available (<10ms), falls back to sync embedding otherwise.
{
"content": "User prefers dark mode in all applications",
"memory_type": "preference",
"namespace": "global",
"importance": 0.8,
"metadata": {"source": "explicit_request"}
}
Response (fast path via daemon):
{
"success": true,
"queued": true,
"queue_id": 42,
"namespace": "global"
}
Response (sync path fallback):
{
"success": true,
"queued": false,
"id": "550e8400-e29b-41d4-a716-446655440000",
"content_hash": "a1b2c3d4e5f67890"
}
daemon_status_tool
Check if the recall daemon is running:
{}
Response:
{
"running": true,
"status": {
"pid": 12345,
"store_queue": {"pending_count": 5},
"embed_worker_running": true
}
}
memory_recall_tool
Search memories by semantic similarity:
{
"query": "user interface preferences",
"n_results": 5,
"namespace": "global",
"memory_type": "preference",
"min_importance": 0.5,
"include_related": true
}
Response:
{
"success": true,
"memories": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"content": "User prefers dark mode in all applications",
"type": "preference",
"namespace": "global",
"importance": 0.8,
"created_at": "2024-01-15T10:30:00",
"accessed_at": "2024-01-15T14:22:00",
"access_count": 3
}
],
"total": 1,
"score": 0.92
}
memory_relate_tool
Create a relationship between memories:
{
"source_id": "mem_new_123",
"target_id": "mem_old_456",
"relation": "supersedes",
"weight": 1.0
}
Response:
{
"success": true,
"edge_id": 42
}
memory_context_tool
Generate formatted context for session injection:
{
"query": "coding style preferences",
"project": "myproject",
"token_budget": 4000
}
Response:
{
"success": true,
"context": "# Memory Context\n\n## Preferences\n\n- User prefers dark mode [global]\n- Use 2-space indentation [project:myproject]\n\n## Recent Decisions\n\n- Decided to use FastAPI for the backend [project:myproject]\n",
"token_estimate": 125
}
memory_forget_tool
Delete memories by ID or semantic search:
{
"memory_id": "550e8400-e29b-41d4-a716-446655440000",
"confirm": true
}
Or delete by search:
{
"query": "outdated preferences",
"namespace": "project:oldproject",
"n_results": 10,
"confirm": true
}
Response:
{
"success": true,
"deleted_ids": ["550e8400-e29b-41d4-a716-446655440000"],
"deleted_count": 1
}
Architecture
┌─────────────────────────────────────────────────────────────┐
│ MCP Server (FastMCP) │
│ memory_store │ memory_recall │ memory_relate │ memory_forget │
└───────────────────────────┬─────────────────────────────────┘
│
┌─────────────┴─────────────┐
│ │
┌─────────▼─────────┐ ┌─────────▼─────────┐
│ FAST PATH │ │ SYNC PATH │
│ <10ms │ │ 10-60s │
└─────────┬─────────┘ └─────────┬─────────┘
│ │
┌─────────▼─────────┐ ┌─────────▼─────────┐
│ recall-daemon │ │ HybridStore │
│ (Unix socket) │ │ (Direct Ollama) │
│ │ └─────────┬─────────┘
│ ┌─────────────┐ │ │
│ │ StoreQueue │ │ ┌───────────┼───────────┐
│ │ EmbedWorker │ │ │ │ │
│ └─────────────┘ │ │ │ │
└─────────┬─────────┘ ┌─▼─────┐ ┌───▼───┐ ┌─────▼─────┐
│ │SQLite │ │Chroma │ │ Ollama │
└─────────────►Store │ │ Store │ │ Client │
└───────┘ └───────┘ └───────────┘
The daemon provides fast (<10ms) memory storage by queueing operations and processing embeddings asynchronously. When the daemon is unavailable, the MCP server falls back to synchronous embedding (10-60s).
Daemon Setup (macOS)
The recall daemon provides fast (<10ms) memory storage by processing embeddings asynchronously. Without the daemon, each store operation blocks for 10-60 seconds waiting for Ollama embeddings.
Quick Install
# From the recall directory
./hooks/install-daemon.sh
This will:
- Copy hook scripts to
~/.claude/hooks/ - Install the launchd plist to
~/Library/LaunchAgents/ - Start the daemon automatically
Manual Install
# 1. Copy hook scripts
cp hooks/recall*.py ~/.claude/hooks/
chmod +x ~/.claude/hooks/recall*.py
# 2. Create logs directory
mkdir -p ~/.claude/hooks/logs
# 3. Install plist with path substitution
sed "s|{{HOME}}|$HOME|g; s|{{RECALL_DIR}}|$(pwd)|g" \
hooks/com.recall.daemon.plist.template > ~/Library/LaunchAgents/com.recall.daemon.plist
# 4. Load the daemon
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist
Daemon Commands
# Check status
echo '{"cmd": "status"}' | nc -U /tmp/recall-daemon.sock | jq
# Stop daemon
launchctl unload ~/Library/LaunchAgents/com.recall.daemon.plist
# Start daemon
launchctl load ~/Library/LaunchAgents/com.recall.daemon.plist
# View logs
tail -f ~/.claude/hooks/logs/recall-daemon.log
Hooks Configuration
Add recall hooks to your Claude Code settings (~/.claude/settings.json). See hooks/settings.example.json for the full configuration.
Development
# Install dev dependencies
uv sync --dev
# Run tests
uv run pytest tests/
# Run tests with coverage
uv run pytest tests/ --cov=recall --cov-report=html
# Type checking
uv run mypy src/recall
# Run specific integration tests
uv run pytest tests/integration/test_mcp_server.py -v
Requirements
- Python 3.13+
- Ollama with:
mxbai-embed-largemodel (required for semantic search)llama3.2model (optional, for session auto-capture hook)
- ~500MB disk space for ChromaDB indices
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.