Memento

Memento

Persistent memory system for LLMs with lossless transcript management, enabling memory recall and full session history search across all conversations.

Category
Visit Server

README

Memento

Persistent memory system for LLMs with lossless transcript management. Designed for Claude via Claude Code.

Memento operates as an engine-lite: a two-layer memory system that captures everything and forgets nothing. The knowledge layer stores distilled facts, decisions, and preferences. The transcript layer stores full session history with a hierarchical summary DAG, enabling regex search and lossless drill-down across all past conversations.

Inspired by the Lossless Context Management (LCM) architecture.

How It Works

Session Start ──> inject core memories + recent session summaries
                  (if resuming after compaction: inject recovery context)
                       │
                       ▼
  ┌──────────── TURN LOOP ─────────────┐
  │ UserPromptSubmit ──> persist user    │
  │ PostToolUse ────────> persist tools  │
  │ Stop ───────────────> persist reply  │
  └──────────────────────────────────────┘
                       │
              [context fills up]
                       │
  PreCompact ──> generate checkpoint summary ──> inject as context
  PostCompact ─> capture Claude's compact_summary
  SessionStart(compact) ──> inject rich recovery context
                       │
  Session End ──> batch ingest full transcript
                  extract knowledge memories
                  detect file artifacts
                  link related sessions
                  build summary DAG (async)

Architecture

┌──────────────────────────────────────────────────────────────┐
│                    MCP Tools (7)                              │
│  recall · remember · transcript_grep · transcript_expand     │
│  transcript_describe · llm_map                               │
└──────────────────────────┬───────────────────────────────────┘
                           │
┌──────────────────────────▼───────────────────────────────────┐
│  Knowledge Layer                                             │
│  memories + edges + dedup + merge + diversify                │
├──────────────────────────────────────────────────────────────┤
│  Transcript Layer                                            │
│  sessions + messages + FTS5 + summary DAG + artifacts        │
├──────────────────────────────────────────────────────────────┤
│  Engine-Lite (hooks)                                         │
│  real-time ingest + compaction awareness + context recovery   │
└──────────────────────────────────┬──────────────────────────┘
                              ┌────▼────┐
                              │ SQLite  │
                              │  + vec  │
                              └─────────┘

Storage: SQLite (WAL mode) with sqlite-vec for HNSW vector search and FTS5 for full-text search. Single file per project, zero infrastructure.

Embeddings: Ollama with nomic-embed-text (768-dim). Zero external API calls.

Summarization: Ollama with qwen2.5:3b for summaries and extraction. Three-level escalation guarantees convergence (LLM -> bullet points -> deterministic truncate).

Requirements

  • macOS (Apple Silicon recommended) or Linux
  • Node.js 20+
  • Docker (only for Ollama, or install Ollama natively)

Quick Start

git clone https://github.com/diego-ninja/memento.git
cd memento

# Install deps + start Ollama + build + pull models
make setup

# Start Ollama
make start

# Verify
make status
make test

Infrastructure

Component Type Purpose
SQLite Embedded All storage (memories, transcripts, vectors, FTS)
sqlite-vec Extension HNSW vector search
Ollama Docker/Native Embeddings + summarization (local LLM)

Ollama runs as a Docker container on port 11435 by default. Alternatively, install Ollama natively and point MEMENTO_OLLAMA_HOST to it.

Configure Claude Code

1. MCP Server

Add to your project's .claude/settings.json:

{
  "mcpServers": {
    "memento": {
      "command": "node",
      "args": ["dist/server.js"],
      "cwd": "/path/to/memento"
    }
  }
}

2. Hooks

Add to your global ~/.claude/settings.json:

{
  "hooks": {
    "SessionStart": [
      { "matcher": "startup", "hooks": [{ "type": "command", "command": "/path/to/memento/hooks/session-start.sh", "timeout": 10 }] },
      { "matcher": "compact", "hooks": [{ "type": "command", "command": "/path/to/memento/hooks/session-start.sh", "timeout": 10 }] }
    ],
    "UserPromptSubmit": [
      { "matcher": ".*", "hooks": [{ "type": "command", "command": "/path/to/memento/hooks/user-prompt.sh", "timeout": 3 }] }
    ],
    "Stop": [
      { "matcher": ".*", "hooks": [{ "type": "command", "command": "/path/to/memento/hooks/stop.sh", "timeout": 3 }] }
    ],
    "PostToolUse": [
      { "matcher": ".*", "hooks": [{ "type": "command", "command": "/path/to/memento/hooks/post-tool.sh", "timeout": 3 }] }
    ],
    "PreCompact": [
      { "matcher": ".*", "hooks": [{ "type": "command", "command": "/path/to/memento/hooks/pre-compact.sh", "timeout": 15 }] }
    ],
    "PostCompact": [
      { "matcher": ".*", "hooks": [{ "type": "command", "command": "/path/to/memento/hooks/post-compact.sh", "timeout": 5 }] }
    ],
    "SessionEnd": [
      { "matcher": ".*", "hooks": [{ "type": "command", "command": "/path/to/memento/hooks/session-end.sh", "timeout": 30 }] }
    ]
  }
}

3. CLAUDE.md instructions

Add to your global ~/.claude/CLAUDE.md:

# Memento -- Persistent Memory

You have access to a persistent memory system via MCP (memento).
Use it transparently -- the user should NOT notice you are consulting
or storing memories.

## When to recall (automatic)

- Starting a new session (hooks load context, but recall for specific topics)
- Before architectural or design decisions
- When the user references something "we discussed" or "last time"
- Unsure about a user preference

## When to remember

Call remember() immediately after:
- Completing a brainstorming or design session
- Writing or validating a design document
- User approving a plan
- Solving a complex bug with reusable learnings

## Transcript tools

- transcript_grep(pattern) — search full session history across all sessions
- transcript_expand(id) — drill into any summary to see original messages
- transcript_describe(id) — quick metadata for sessions, summaries, artifacts

MCP Tools

Knowledge Layer

Tool Description
recall Hybrid text+vector search over distilled memories. Returns top-3 with graph-based diversity.
remember Store memories with automatic dedup (>0.92 skip), merge (0.80-0.92), and graph edge creation.

Transcript Layer

Tool Description
transcript_grep Substring/FTS5 search across all past session transcripts. Filter by session, role, limit.
transcript_expand Lossless drill-down: summary ID, session ID, or message ID -> original messages with context.
transcript_describe Metadata inspection for sessions (with artifacts, linked sessions), summaries, messages.

Operators

Tool Description
llm_map Process N items in parallel with a prompt template. Configurable concurrency and retries.

Hooks

Hook Event Purpose
session-start.sh SessionStart (startup/compact) Inject core memories + session summaries. Post-compact: inject recovery context.
user-prompt.sh UserPromptSubmit Real-time capture of user prompts to immutable store.
stop.sh Stop Real-time capture of assistant responses.
post-tool.sh PostToolUse Real-time capture of tool calls (Read, Write, Bash, etc).
pre-compact.sh PreCompact Generate checkpoint summary and inject as additionalContext (survives compaction).
post-compact.sh PostCompact Capture Claude Code's compact_summary into the summary DAG.
subagent-stop.sh SubagentStop Ingest sub-agent transcripts + persist final message in parent session.
session-end.sh SessionEnd Batch ingest transcript + extract memories + detect artifacts + link sessions + build DAG.

Data Storage

~/.memento/
├── config.json                  # Optional config overrides
└── projects/
    └── {sha256-hash}/
        ├── memories.db          # Knowledge layer (memories + edges + vector index)
        └── transcripts.db       # Transcript layer (sessions, messages, summaries, artifacts)

Knowledge Layer (memories.db)

Table Purpose
memories Distilled knowledge: decisions, learnings, preferences, facts
memories_fts FTS5 full-text search index
vec_memories sqlite-vec HNSW vector index (768-dim embeddings)
memory_edges Semantic graph: bidirectional edges between related memories

Transcript Layer (transcripts.db)

Table Purpose
sessions Session metadata with root summary pointer
messages Immutable verbatim transcript (every message, every turn)
messages_fts FTS5 virtual table for full-text search
summaries Hierarchical DAG nodes (leaf, condensed, compact_capture)
summary_sources DAG edges: summary -> messages/summaries (provenance)
artifacts Tracked file references with exploration summaries
session_edges Cross-session links (continuation, related)

Memory Types

Type Purpose
decision Architectural or design choices
learning Bugs resolved, patterns discovered
preference User preferences expressed or inferred
context Session summaries, work context
fact Non-obvious codebase facts

CLI Commands

Knowledge

Command Description
recall <query> Search memories
stats Show memory count
core List core memories
maintain Degrade stale core memories (>30 days)
extract <transcript> Extract memories from a transcript file
flush Delete all memories

Transcript

Command Description
sessions --recent N List recent sessions with summaries
ingest-message --session <id> --role <role> --content <text> Persist a single message (used by hooks)
ingest-transcript --session <id> --path <file> Batch ingest a full JSONL transcript
build-dag --session <id> Build hierarchical summary DAG
checkpoint --session <id> Generate session checkpoint for pre-compact
session-summary --session <id> Get root summary of a session
store-compact-summary --session <id> --summary <text> Store Claude's compact_summary
detect-artifacts --session <id> Find and store file artifacts
link-sessions Create edges between related sessions

Project Structure

src/
├── server.ts                   # MCP server entry point (7 tools)
├── cli.ts                      # CLI for hooks and manual use
├── config.ts                   # Configuration + project paths
├── types.ts                    # Core type definitions
├── extract.ts                  # Transcript extraction (LLM + regex)
├── tools/
│   ├── recall.ts               # Knowledge recall with graph boost + diversify
│   ├── remember.ts             # Knowledge store with dedup pipeline
│   ├── transcript-grep.ts      # Regex/FTS search over transcripts
│   ├── transcript-expand.ts    # Lossless summary -> message drill-down
│   ├── transcript-describe.ts  # Metadata inspection
│   └── llm-map.ts              # Parallel batch processing operator
├── transcript/
│   ├── db.ts                   # TranscriptDb (SQLite: sessions, messages, summaries, artifacts, edges)
│   ├── parse.ts                # Claude Code JSONL transcript parser
│   ├── ingest.ts               # Single message + batch transcript ingestion
│   ├── summarize.ts            # Three-level escalation + DAG construction
│   ├── artifacts.ts            # File artifact detection + exploration summaries
│   ├── session-edges.ts        # Cross-session edge detection
│   └── tokens.ts               # Token estimator
├── storage/
│   ├── unified.ts              # UnifiedStorage (SQLite + sqlite-vec + FTS5)
│   └── pipeline.ts             # Shared dedup/merge pipeline
├── search/
│   ├── hybrid.ts               # Hybrid text+vector search (RRF fusion)
│   └── reranker.ts             # Recency + type weight + graph degree ranking
└── embeddings/
    └── ollama.ts               # Ollama client (embeddings + merge + summarize)

hooks/
├── session-start.sh            # Startup + post-compact recovery
├── user-prompt.sh              # Real-time user prompt capture
├── stop.sh                     # Real-time assistant response capture
├── post-tool.sh                # Real-time tool call capture
├── pre-compact.sh              # Checkpoint summary injection
├── post-compact.sh             # Compact summary capture
└── session-end.sh              # Final ingest + extract + DAG + artifacts + edges

Configuration

Default config (override via ~/.memento/config.json):

{
  "ollama": {
    "host": "http://127.0.0.1:11435",
    "embeddingModel": "nomic-embed-text",
    "generativeModel": "qwen2.5:3b"
  },
  "search": {
    "topK": 20,
    "finalK": 3,
    "deduplicationThreshold": 0.92,
    "mergeThreshold": 0.80,
    "rrfK": 60
  },
  "core": {
    "promoteAfterRecalls": 3,
    "degradeAfterSessions": 30
  }
}

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured