memory-mcp

memory-mcp

A self-organizing, persistent semantic memory layer that enables AI agents to store, categorize, and retrieve information using hybrid vector and keyword search. It features autonomous chunking, deduplication, and hierarchical taxonomy management through a PostgreSQL-backed MCP server.

Category
Visit Server

README

memory-mcp

Persistent, self-organizing semantic memory for AI agents — served as an MCP server.

License: MIT Docker Python 3.11+


What is this?

memory-mcp is a Model Context Protocol server that gives AI agents durable, searchable memory backed by PostgreSQL and pgvector. Drop it into any MCP-compatible client (Claude Code, Cursor, Windsurf, etc.) and your agent gains the ability to remember, retrieve, and reason over information across sessions — without you managing any schema or storage logic.

What it does autonomously:

  • Chunks and embeds incoming text
  • Categorizes memories into a hierarchical taxonomy (ltree dot-paths)
  • Deduplicates against existing memories and resolves conflicts
  • Synthesizes a System Primer — a compressed, always-current summary of everything it knows — and surfaces it at session start
  • Expires stale memories via TTL and prompts for verification of aging facts

Why memory-mcp?

memory-mcp Simple vector DB LangChain / LlamaIndex memory
Schema management Automatic Manual Manual
Deduplication Semantic + LLM None None
Taxonomy Auto-assigned ltree None None
Session bootstrap System Primer Manual RAG Manual
Conflict resolution LLM-evaluated None None
Ephemeral context Built-in (TTL store) No No
Self-hostable Yes (Docker) Varies No
MCP-native Yes No No

Architecture

AI Agent (Claude Code / Cursor / Windsurf)
        │  HTTP (MCP — Streamable HTTP)
        ▼
┌──────────────────────────────────────────┐
│              server.py                    │
│  ┌─────────────────┐ ┌─────────────────┐ │
│  │ Production MCP  │ │   Admin MCP     │ │
│  │   :8766/mcp     │ │   :8767/mcp     │ │
│  └────────┬────────┘ └────────┬────────┘ │
│           │  tools/           │           │
│  ┌────────▼──────────────────▼────────┐  │
│  │  ingestion · search · context      │  │
│  │  crud · admin_tools · context_store│  │
│  └────────────────┬───────────────────┘  │
│                   │                       │
│  ┌────────────────▼───────────────────┐  │
│  │         Background Workers          │  │
│  │  Ingestion Queue · TTL Daemon       │  │
│  │  System Primer Auto-Regeneration    │  │
│  └────────────────┬───────────────────┘  │
└───────────────────┼──────────────────────┘
                    │  asyncpg
                    ▼
         PostgreSQL + pgvector
         ┌─────────────────┐
         │ memories        │  chunks, embeddings, ltree paths
         │ memory_edges    │  sequence_next, relates_to, supersedes
         │ ingestion_staging│ async job queue
         │ context_store   │  ephemeral TTL store
         └─────────────────┘
                    │
         ┌──────────▼──────────┐
         │  Backup Service     │  pg_dump → private GitHub repo
         └─────────────────────┘

Two servers, one process:

  • Production (:8766) — tools safe for the agent to call freely
  • Admin (:8767) — superset including destructive tools (delete, prune, bulk-move). Point your agent at production; use admin for maintenance.

Quickstart (Docker)

Prerequisites: Docker + Docker Compose, an OpenAI API key.

# 1. Clone
git clone https://github.com/isaacriehm/memory-mcp.git
cd memory-mcp

# 2. Configure
cp .env.example .env
$EDITOR .env   # set OPENAI_API_KEY and DB_PASSWORD at minimum

# 3. Start
docker compose up -d

# Production MCP endpoint: http://localhost:8766/mcp
# Admin MCP endpoint:      http://localhost:8767/mcp

To rebuild after code changes:

docker compose up -d --build memory-api

Connecting to an MCP Client

Claude Code

Add to your project's .claude/settings.json or ~/.claude/settings.json:

{
  "mcpServers": {
    "memory": {
      "type": "http",
      "url": "http://localhost:8766/mcp"
    }
  }
}

Or via the CLI:

claude mcp add memory --transport http http://localhost:8766/mcp

Then add this instruction to your CLAUDE.md so the agent always bootstraps memory at session start:

## Memory
At the start of every session, call `initialize_context` before anything else.
This returns your System Primer — your identity, current knowledge taxonomy, and retrieval guide.
Always consult it before answering questions about prior context.

Cursor / Windsurf

Add to your MCP settings (.cursor/mcp.json or equivalent):

{
  "mcpServers": {
    "memory": {
      "url": "http://localhost:8766/mcp"
    }
  }
}

MCP Tools

Production Tools (:8766)

Tool Description
initialize_context Call first every session. Returns the System Primer + verification prompts for aging memories.
memorize_context Ingest raw text. Automatically chunks, embeds, categorizes, and deduplicates. Supports ttl_days.
check_ingestion_status Poll async ingestion job by job_id. Returns pending, processing, complete, or failed.
search_memory Hybrid vector + BM25 search with Reciprocal Rank Fusion. Filter by category_path.
list_categories Return all occupied taxonomy paths with memory counts.
explore_taxonomy Drill into a collapsed [+N more] branch from list_categories.
fetch_document Reconstruct a full document by following sequence_next edges from a memory ID.
trace_history Inspect the full supersession chain (oldest → newest) for a memory.
confirm_memory_validity Confirm an aging memory is still accurate. Advances its verify_after date.
update_memory Rewrite a memory's content in-place (preserves identity, edges, history).
set_context Write a key/value pair to the ephemeral context store with a TTL.
get_context Retrieve an ephemeral context entry by key.
list_context_keys List active (non-expired) context keys, optionally filtered by scope.
delete_context Explicitly delete a context entry before its TTL expires.
extend_context_ttl Push a context entry's expiry forward by N hours.

Admin-Only Tools (:8767)

Tool Description
delete_memory Hard-delete a memory by ID (cascades edges).
prune_history Batch-delete superseded memories older than N days.
export_memories Export all active memories to JSON.
recategorize_memory Move a single memory to a new taxonomy path.
bulk_move_category Move an entire taxonomy branch (e.g. old.prefixnew.prefix).
update_memory_metadata Patch a memory's metadata JSONB in-place.
run_diagnostics Report on pool health, memory counts, ingestion queue depth.
get_ingestion_stats Breakdown of ingestion job statuses.
flush_staging Clear all completed/failed staging jobs immediately.

Taxonomy

Memories are organized into a dot-path hierarchy using PostgreSQL ltree. The system assigns paths automatically during ingestion. You can override with recategorize_memory or bulk_move_category.

Example paths:

user.profile.personal
user.health.medical
projects.myapp.architecture
projects.myapp.decisions
organizations.acme.business
concepts.ai.behavior
reference.system.primer     ← auto-generated System Primer lives here

Search is subtree-aware — passing category_path: "projects.myapp" returns everything under that branch.


System Primer

initialize_context returns a synthesized summary stored at reference.system.primer. It includes:

  • A compressed user/agent profile
  • The full taxonomy tree with memory counts
  • Retrieval guidance

The primer auto-regenerates in the background when ≥10 new memories are ingested or when the previous primer is older than 1 hour. You can force regeneration via the admin tool synthesize_system_primer.


Environment Variables

Copy .env.example to .env and fill in your values.

Required

Variable Description
DATABASE_URL PostgreSQL connection string (e.g. postgresql://user:pass@localhost:5432/memory)
OPENAI_API_KEY OpenAI API key for embeddings and LLM calls
DB_PASSWORD PostgreSQL password (used by Docker Compose)

Optional — Models & Embeddings

Variable Default Description
EMBEDDING_MODEL text-embedding-3-small OpenAI embedding model
EXTRACT_MODEL gpt-5-mini LLM for semantic section extraction and categorization
CONFLICT_MODEL gpt-5-nano LLM for conflict/dedup evaluation
EMBED_DIM 1536 Embedding vector dimension (must match model)

Optional — Search & Limits

Variable Default Description
DEFAULT_SEARCH_LIMIT 10 Default result count for search_memory
DEFAULT_LIST_LIMIT 50 Default result count for list_categories
DUP_THRESHOLD 0.95 Cosine similarity threshold for deduplication
CONFLICT_THRESHOLD 0.55 Similarity threshold for conflict detection
RELATES_TO_THRESHOLD 0.65 Similarity threshold for relates_to edge creation
MIN_SECTION_LENGTH 100 Minimum character length for a chunk to be stored
MAX_TAXONOMY_PATHS 40 Max taxonomy paths assigned per ingestion

Optional — OpenAI & Concurrency

Variable Default Description
OPENAI_TIMEOUT_S 60 Per-request OpenAI timeout in seconds
OPENAI_MAX_RETRIES 5 Exponential-backoff retry limit
MAX_CONCURRENT_API_CALLS 5 Semaphore for parallel OpenAI requests
EXTRACT_REASONING low Reasoning effort for extraction LLM
CONFLICT_REASONING minimal Reasoning effort for conflict LLM

Optional — Database

Variable Default Description
PG_POOL_MIN 1 asyncpg minimum pool connections
PG_POOL_MAX 10 asyncpg maximum pool connections
STAGING_RETENTION_DAYS 7 Days to retain completed/failed staging jobs

Optional — Server

Variable Default Description
PRODUCTION_PORT 8766 Production MCP server port
ADMIN_PORT 8767 Admin MCP server port
MCP_TRANSPORT streamable-http FastMCP transport mode
FASTMCP_JSON_RESPONSE Set to 1 to force JSON responses
LOG_LEVEL INFO DEBUG / INFO / WARNING

Optional — System Primer

Variable Default Description
PRIMER_UPDATE_MAX_AGE_S 3600 Max seconds before auto primer regeneration

Optional — Context Store

Variable Default Description
CONTEXT_DEFAULT_TTL_HOURS 24 Default TTL for context store entries
CONTEXT_MAX_VALUE_LENGTH 50000 Max character length for context values
CONTEXT_MAX_KEY_LENGTH 200 Max character length for context keys

Optional — Backup Service

Variable Description
GITHUB_PAT GitHub Personal Access Token with repo scope
GITHUB_BACKUP_REPO Target repo in owner/repo format
BACKUP_INTERVAL_SECONDS Seconds between backups (default: 21600 = 6 hours)

Running Locally (Development)

Requirements: Python 3.11+, PostgreSQL with pgvector.

# Create and activate virtual environment
python3.11 -m venv .venv
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Configure
cp .env.example .env
$EDITOR .env

# Start the server
python -m server
# Production: http://0.0.0.0:8766
# Admin:      http://0.0.0.0:8767

Backup Service

The backup/ directory contains a containerized PostgreSQL backup job that:

  1. Runs pg_dump on the configured interval (default: every 6 hours)
  2. Commits the dump to a private GitHub repository

The backup service starts automatically with docker compose up. Set GITHUB_PAT and GITHUB_BACKUP_REPO in your .env to enable it. If those variables are unset, the service will error on startup — remove the memory-backup service from docker-compose.yml if you don't need backups.


CLI Scripts

Standalone scripts in scripts/ (require DATABASE_URL in environment):

# Export all memories to a timestamped JSON file
python scripts/export_memories.py

# Generate an interactive graph visualization
python scripts/visualize_memories.py
open memory_map.html

Contributing

See CONTRIBUTING.md.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured