memory-mcp
A self-organizing, persistent semantic memory layer that enables AI agents to store, categorize, and retrieve information using hybrid vector and keyword search. It features autonomous chunking, deduplication, and hierarchical taxonomy management through a PostgreSQL-backed MCP server.
README
memory-mcp
Persistent, self-organizing semantic memory for AI agents — served as an MCP server.
What is this?
memory-mcp is a Model Context Protocol server that gives AI agents durable, searchable memory backed by PostgreSQL and pgvector. Drop it into any MCP-compatible client (Claude Code, Cursor, Windsurf, etc.) and your agent gains the ability to remember, retrieve, and reason over information across sessions — without you managing any schema or storage logic.
What it does autonomously:
- Chunks and embeds incoming text
- Categorizes memories into a hierarchical taxonomy (
ltreedot-paths) - Deduplicates against existing memories and resolves conflicts
- Synthesizes a System Primer — a compressed, always-current summary of everything it knows — and surfaces it at session start
- Expires stale memories via TTL and prompts for verification of aging facts
Why memory-mcp?
| memory-mcp | Simple vector DB | LangChain / LlamaIndex memory | |
|---|---|---|---|
| Schema management | Automatic | Manual | Manual |
| Deduplication | Semantic + LLM | None | None |
| Taxonomy | Auto-assigned ltree | None | None |
| Session bootstrap | System Primer | Manual RAG | Manual |
| Conflict resolution | LLM-evaluated | None | None |
| Ephemeral context | Built-in (TTL store) | No | No |
| Self-hostable | Yes (Docker) | Varies | No |
| MCP-native | Yes | No | No |
Architecture
AI Agent (Claude Code / Cursor / Windsurf)
│ HTTP (MCP — Streamable HTTP)
▼
┌──────────────────────────────────────────┐
│ server.py │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Production MCP │ │ Admin MCP │ │
│ │ :8766/mcp │ │ :8767/mcp │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ tools/ │ │
│ ┌────────▼──────────────────▼────────┐ │
│ │ ingestion · search · context │ │
│ │ crud · admin_tools · context_store│ │
│ └────────────────┬───────────────────┘ │
│ │ │
│ ┌────────────────▼───────────────────┐ │
│ │ Background Workers │ │
│ │ Ingestion Queue · TTL Daemon │ │
│ │ System Primer Auto-Regeneration │ │
│ └────────────────┬───────────────────┘ │
└───────────────────┼──────────────────────┘
│ asyncpg
▼
PostgreSQL + pgvector
┌─────────────────┐
│ memories │ chunks, embeddings, ltree paths
│ memory_edges │ sequence_next, relates_to, supersedes
│ ingestion_staging│ async job queue
│ context_store │ ephemeral TTL store
└─────────────────┘
│
┌──────────▼──────────┐
│ Backup Service │ pg_dump → private GitHub repo
└─────────────────────┘
Two servers, one process:
- Production (
:8766) — tools safe for the agent to call freely - Admin (
:8767) — superset including destructive tools (delete, prune, bulk-move). Point your agent at production; use admin for maintenance.
Quickstart (Docker)
Prerequisites: Docker + Docker Compose, an OpenAI API key.
# 1. Clone
git clone https://github.com/isaacriehm/memory-mcp.git
cd memory-mcp
# 2. Configure
cp .env.example .env
$EDITOR .env # set OPENAI_API_KEY and DB_PASSWORD at minimum
# 3. Start
docker compose up -d
# Production MCP endpoint: http://localhost:8766/mcp
# Admin MCP endpoint: http://localhost:8767/mcp
To rebuild after code changes:
docker compose up -d --build memory-api
Connecting to an MCP Client
Claude Code
Add to your project's .claude/settings.json or ~/.claude/settings.json:
{
"mcpServers": {
"memory": {
"type": "http",
"url": "http://localhost:8766/mcp"
}
}
}
Or via the CLI:
claude mcp add memory --transport http http://localhost:8766/mcp
Then add this instruction to your CLAUDE.md so the agent always bootstraps memory at session start:
## Memory
At the start of every session, call `initialize_context` before anything else.
This returns your System Primer — your identity, current knowledge taxonomy, and retrieval guide.
Always consult it before answering questions about prior context.
Cursor / Windsurf
Add to your MCP settings (.cursor/mcp.json or equivalent):
{
"mcpServers": {
"memory": {
"url": "http://localhost:8766/mcp"
}
}
}
MCP Tools
Production Tools (:8766)
| Tool | Description |
|---|---|
initialize_context |
Call first every session. Returns the System Primer + verification prompts for aging memories. |
memorize_context |
Ingest raw text. Automatically chunks, embeds, categorizes, and deduplicates. Supports ttl_days. |
check_ingestion_status |
Poll async ingestion job by job_id. Returns pending, processing, complete, or failed. |
search_memory |
Hybrid vector + BM25 search with Reciprocal Rank Fusion. Filter by category_path. |
list_categories |
Return all occupied taxonomy paths with memory counts. |
explore_taxonomy |
Drill into a collapsed [+N more] branch from list_categories. |
fetch_document |
Reconstruct a full document by following sequence_next edges from a memory ID. |
trace_history |
Inspect the full supersession chain (oldest → newest) for a memory. |
confirm_memory_validity |
Confirm an aging memory is still accurate. Advances its verify_after date. |
update_memory |
Rewrite a memory's content in-place (preserves identity, edges, history). |
set_context |
Write a key/value pair to the ephemeral context store with a TTL. |
get_context |
Retrieve an ephemeral context entry by key. |
list_context_keys |
List active (non-expired) context keys, optionally filtered by scope. |
delete_context |
Explicitly delete a context entry before its TTL expires. |
extend_context_ttl |
Push a context entry's expiry forward by N hours. |
Admin-Only Tools (:8767)
| Tool | Description |
|---|---|
delete_memory |
Hard-delete a memory by ID (cascades edges). |
prune_history |
Batch-delete superseded memories older than N days. |
export_memories |
Export all active memories to JSON. |
recategorize_memory |
Move a single memory to a new taxonomy path. |
bulk_move_category |
Move an entire taxonomy branch (e.g. old.prefix → new.prefix). |
update_memory_metadata |
Patch a memory's metadata JSONB in-place. |
run_diagnostics |
Report on pool health, memory counts, ingestion queue depth. |
get_ingestion_stats |
Breakdown of ingestion job statuses. |
flush_staging |
Clear all completed/failed staging jobs immediately. |
Taxonomy
Memories are organized into a dot-path hierarchy using PostgreSQL ltree. The system assigns paths automatically during ingestion. You can override with recategorize_memory or bulk_move_category.
Example paths:
user.profile.personal
user.health.medical
projects.myapp.architecture
projects.myapp.decisions
organizations.acme.business
concepts.ai.behavior
reference.system.primer ← auto-generated System Primer lives here
Search is subtree-aware — passing category_path: "projects.myapp" returns everything under that branch.
System Primer
initialize_context returns a synthesized summary stored at reference.system.primer. It includes:
- A compressed user/agent profile
- The full taxonomy tree with memory counts
- Retrieval guidance
The primer auto-regenerates in the background when ≥10 new memories are ingested or when the previous primer is older than 1 hour. You can force regeneration via the admin tool synthesize_system_primer.
Environment Variables
Copy .env.example to .env and fill in your values.
Required
| Variable | Description |
|---|---|
DATABASE_URL |
PostgreSQL connection string (e.g. postgresql://user:pass@localhost:5432/memory) |
OPENAI_API_KEY |
OpenAI API key for embeddings and LLM calls |
DB_PASSWORD |
PostgreSQL password (used by Docker Compose) |
Optional — Models & Embeddings
| Variable | Default | Description |
|---|---|---|
EMBEDDING_MODEL |
text-embedding-3-small |
OpenAI embedding model |
EXTRACT_MODEL |
gpt-5-mini |
LLM for semantic section extraction and categorization |
CONFLICT_MODEL |
gpt-5-nano |
LLM for conflict/dedup evaluation |
EMBED_DIM |
1536 |
Embedding vector dimension (must match model) |
Optional — Search & Limits
| Variable | Default | Description |
|---|---|---|
DEFAULT_SEARCH_LIMIT |
10 |
Default result count for search_memory |
DEFAULT_LIST_LIMIT |
50 |
Default result count for list_categories |
DUP_THRESHOLD |
0.95 |
Cosine similarity threshold for deduplication |
CONFLICT_THRESHOLD |
0.55 |
Similarity threshold for conflict detection |
RELATES_TO_THRESHOLD |
0.65 |
Similarity threshold for relates_to edge creation |
MIN_SECTION_LENGTH |
100 |
Minimum character length for a chunk to be stored |
MAX_TAXONOMY_PATHS |
40 |
Max taxonomy paths assigned per ingestion |
Optional — OpenAI & Concurrency
| Variable | Default | Description |
|---|---|---|
OPENAI_TIMEOUT_S |
60 |
Per-request OpenAI timeout in seconds |
OPENAI_MAX_RETRIES |
5 |
Exponential-backoff retry limit |
MAX_CONCURRENT_API_CALLS |
5 |
Semaphore for parallel OpenAI requests |
EXTRACT_REASONING |
low |
Reasoning effort for extraction LLM |
CONFLICT_REASONING |
minimal |
Reasoning effort for conflict LLM |
Optional — Database
| Variable | Default | Description |
|---|---|---|
PG_POOL_MIN |
1 |
asyncpg minimum pool connections |
PG_POOL_MAX |
10 |
asyncpg maximum pool connections |
STAGING_RETENTION_DAYS |
7 |
Days to retain completed/failed staging jobs |
Optional — Server
| Variable | Default | Description |
|---|---|---|
PRODUCTION_PORT |
8766 |
Production MCP server port |
ADMIN_PORT |
8767 |
Admin MCP server port |
MCP_TRANSPORT |
streamable-http |
FastMCP transport mode |
FASTMCP_JSON_RESPONSE |
— | Set to 1 to force JSON responses |
LOG_LEVEL |
INFO |
DEBUG / INFO / WARNING |
Optional — System Primer
| Variable | Default | Description |
|---|---|---|
PRIMER_UPDATE_MAX_AGE_S |
3600 |
Max seconds before auto primer regeneration |
Optional — Context Store
| Variable | Default | Description |
|---|---|---|
CONTEXT_DEFAULT_TTL_HOURS |
24 |
Default TTL for context store entries |
CONTEXT_MAX_VALUE_LENGTH |
50000 |
Max character length for context values |
CONTEXT_MAX_KEY_LENGTH |
200 |
Max character length for context keys |
Optional — Backup Service
| Variable | Description |
|---|---|
GITHUB_PAT |
GitHub Personal Access Token with repo scope |
GITHUB_BACKUP_REPO |
Target repo in owner/repo format |
BACKUP_INTERVAL_SECONDS |
Seconds between backups (default: 21600 = 6 hours) |
Running Locally (Development)
Requirements: Python 3.11+, PostgreSQL with pgvector.
# Create and activate virtual environment
python3.11 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Configure
cp .env.example .env
$EDITOR .env
# Start the server
python -m server
# Production: http://0.0.0.0:8766
# Admin: http://0.0.0.0:8767
Backup Service
The backup/ directory contains a containerized PostgreSQL backup job that:
- Runs
pg_dumpon the configured interval (default: every 6 hours) - Commits the dump to a private GitHub repository
The backup service starts automatically with docker compose up. Set GITHUB_PAT and GITHUB_BACKUP_REPO in your .env to enable it. If those variables are unset, the service will error on startup — remove the memory-backup service from docker-compose.yml if you don't need backups.
CLI Scripts
Standalone scripts in scripts/ (require DATABASE_URL in environment):
# Export all memories to a timestamped JSON file
python scripts/export_memories.py
# Generate an interactive graph visualization
python scripts/visualize_memories.py
open memory_map.html
Contributing
See CONTRIBUTING.md.
License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.