CozoDB Memory MCP Server

CozoDB Memory MCP Server

Local-first memory for Claude & AI agents with hybrid search, Graph-RAG, and time-travel, runs entirely on your machine.

Category
Visit Server

README

CozoDB Memory MCP Server

npm Node License MCP Badge

Why Cozo Memory?
LLMs have short-term memory limits. Standard RAG retrieves documents but can't connect facts across time. Cozo Memory gives your AI agent persistent, structured memory – it remembers past conversations, infers relationships, detects contradictions, and explores its knowledge graph – fully on your machine, with optional local LLM integration via Ollama for intelligent actions (cleanup, reflection, summarization, agentic routing).

Most memory stacks combine separate databases: SQLite for facts, Chroma for vector search, NetworkX for graphs. CozoDB replaces all of that with one embedded engine: relational, graph, vector, and full-text search in a single query language, one file, zero sync lag.

Local-first memory for Claude & AI agents with hybrid search, Graph-RAG, and time-travel – runs entirely on your machine. Optional Ollama integration enables LLM-powered actions (cleanup, reflect, summarize, agentic retrieval).

Table of Contents

Quick Start

Option 1: Install via npm (Recommended)

# Install globally
npm install -g cozo-memory

# Or run directly with npx (no installation needed)
npx cozo-memory

Option 2: Build from Source

git clone https://github.com/tobs-code/cozo-memory
cd cozo-memory
npm install && npm run build
npm run start

Now add the server to your MCP client (e.g. Claude Desktop) – see Integration below.

Key Features

šŸ” Hybrid Search - Combines semantic (HNSW), full-text (FTS), and graph signals via Reciprocal Rank Fusion for intelligent retrieval

🧠 Agentic Retrieval - Auto-routing engine analyzes query intent via local LLM to select optimal search strategy (Vector, Graph, or Community)

ā±ļø Time-Travel Queries - Version all changes via CozoDB Validity; query any point in history with full audit trails

šŸŽÆ GraphRAG-R1-Inspired Adaptive Retrieval - Intelligent system with Progressive Retrieval Attenuation (PRA) and Cost-Aware F1 (CAF) scoring, conceptually inspired by GraphRAG-R1 (Yu et al., WWW 2026) and adapted for CozoDB, that learns from usage

ā³ Temporal Conflict Resolution - Automatic detection and resolution of contradictory observations with semantic analysis and audit preservation

šŸ  100% Local - Embeddings via ONNX/Transformers; data stays on your machine. Some advanced features (cleanup, reflect, summarize, agentic search) require an optional Ollama service for local LLM inference — but the core search, CRUD, and graph operations work without any LLM.

🧠 Multi-Hop Reasoning - Logic-aware graph traversal with vector pivots for deep relational reasoning

šŸ—‚ļø Hierarchical Memory - Multi-level architecture (L0-L3) with intelligent compression and LLM-backed summarization

→ See all features | Version History

Positioning & Comparison

Why CozoDB instead of SQLite + Chroma + NetworkX?

A common first question is: "Why not just combine existing tools?"

If you need... Typical separate stack CozoDB Memory
Structured data & relations SQLite / PostgreSQL āœ… Built-in relational engine
Semantic / vector search Chroma / Qdrant / Pinecone āœ… HNSW + FTS + RRF in one engine
Graph traversal & reasoning NetworkX / Neo4j āœ… Native graph queries + PageRank
Time-travel / versioning Custom audit tables āœ… Built-in Validity time-travel
Unified query language Multiple APIs + glue code āœ… Single Datalog query across all dimensions

The core insight: Most memory stacks bolt vector search onto a graph DB, or graph search onto a vector DB. CozoDB is different: it is a single engine that natively combines relational, graph, vector, and full-text search. That means:

  • One query language (Datalog) reaches every dimension.
  • No sync lag between separate indexes.
  • No ETL bridge between "vector results" and "graph expansion."
  • Smaller operational surface: one database file, one process, one dependency chain.

Comparison with other memory solutions

Most "Memory" MCP servers fall into two categories:

  1. Simple Knowledge Graphs: CRUD operations on triples, often only text search
  2. Pure Vector Stores: Semantic search (RAG), but little understanding of complex relationships

This server fills the gap in between ("Sweet Spot"): A local, database-backed memory engine combining vector, graph, and keyword signals — powered by CozoDB's unified engine rather than a patchwork of separate databases.

Feature CozoDB Memory (This Project) Official Reference (@modelcontextprotocol/server-memory) mcp-memory-service (Community) Database Adapters (Qdrant/Neo4j)
Backend CozoDB (Graph + Vector + Relational + FTS in one engine) JSON file (memory.jsonl) SQLite / Cloudflare Specialized DB (only Vector or Graph)
Search Logic Agentic (Auto-Route): Hybrid + Graph + Summaries Keyword only / Exact Graph Match Vector + Keyword Mostly only one dimension
Inference Yes: Built-in engine for implicit knowledge No No ("Dreaming" is consolidation) No (Retrieval only)
Community Yes: Hierarchical Community Summaries No No Only clustering (no summary)
Time-Travel Yes: Queries at any point in time (Validity) No (current state only) History available, no native DB feature No
Maintenance Janitor: LLM-backed cleanup Manual Automatic consolidation Mostly manual
Deployment Local (Node.js + Embedded DB) Local (Docker/NPX) Local or Cloud Often requires external DB server

The core advantage is Intelligence and Traceability: By combining an Agentic Retrieval Layer with Hierarchical GraphRAG, the system can answer both specific factual questions and broad thematic queries with much higher accuracy than pure vector stores.

Installation

Prerequisites

  • Node.js 20+ (recommended)
  • RAM: 1.7 GB minimum (for default bge-m3 model)
    • Model download: ~600 MB
    • Runtime memory: ~1.1 GB
    • ⚔ Too heavy? Use EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2 – only ~400 MB RAM needed (see Embedding Model Options)
  • CozoDB native dependency is installed via cozo-node

Optional: Ollama for LLM-powered actions

Some advanced actions use a local LLM via Ollama for intelligent processing. The core server works without Ollama (CRUD, search, graph operations), but the following actions require it:

Action Purpose
cleanup LLM-backed observation consolidation
reflect Generate insights, detect contradictions
summarize_communities LLM-generated community summaries
compact Session / entity compaction with LLM summarization
agentic_search Query intent classification for auto-routing

Setup (if you need these features):

# 1. Install Ollama from https://ollama.ai
# 2. Pull a model (e.g. small + fast for dev):
ollama pull demyagent-4b-i1:Q6_K
# 3. Ollama runs automatically on http://localhost:11434

If Ollama is not running, the affected actions gracefully fall back to non-LLM behavior (where possible) or return a clear error message.

Via npm (Easiest)

# Install globally
npm install -g cozo-memory

# Or use npx without installation
npx cozo-memory

From Source

git clone https://github.com/tobs-code/cozo-memory
cd cozo-memory
npm install
npm run build

Windows Quickstart

npm install
npm run build
npm run start

Notes:

  • On first start, @xenova/transformers downloads the embedding model (may take time)
  • Embeddings are processed on the CPU

Embedding Model Options

CozoDB Memory supports multiple embedding models via the EMBEDDING_MODEL environment variable:

Model Size RAM Dimensions Best For
Xenova/bge-m3 (default) ~600 MB ~1.7 GB 1024 High accuracy, production use
Xenova/all-MiniLM-L6-v2 ~80 MB ~400 MB 384 Low-spec machines, development
Xenova/bge-small-en-v1.5 ~130 MB ~600 MB 384 Balanced performance

Configuration Options:

Option 1: Using .env file (Easiest for beginners)

# Copy the example file
cp .env.example .env

# Edit .env and set your preferred model
EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2

Option 2: MCP Server Config (For Claude Desktop / Kiro)

{
  "mcpServers": {
    "cozo-memory": {
      "command": "npx",
      "args": ["cozo-memory"],
      "env": {
        "EMBEDDING_MODEL": "Xenova/all-MiniLM-L6-v2"
      }
    }
  }
}

Option 3: Command Line

# Use lightweight model for development
EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2 npm run start

Download Model First (Recommended):

# Set model in .env or via command line, then:
EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2 npm run download-model

Note: Changing models requires re-embedding existing data. The model is downloaded once on first use.

Integration

Claude Desktop

Using npx (Recommended)

{
  "mcpServers": {
    "cozo-memory": {
      "command": "npx",
      "args": ["cozo-memory"]
    }
  }
}

Using global installation

{
  "mcpServers": {
    "cozo-memory": {
      "command": "cozo-memory"
    }
  }
}

Using local build

{
  "mcpServers": {
    "cozo-memory": {
      "command": "node",
      "args": ["C:/Path/to/cozo-memory/dist/index.js"]
    }
  }
}

Framework Adapters

Official adapters for seamless integration with popular AI frameworks:

🦜 LangChain Adapter

npm install @cozo-memory/langchain @cozo-memory/adapters-core
import { CozoMemoryChatHistory, CozoMemoryRetriever } from '@cozo-memory/langchain';

const chatHistory = new CozoMemoryChatHistory({ sessionName: 'user-123' });
const retriever = new CozoMemoryRetriever({ useGraphRAG: true, graphRAGDepth: 2 });

šŸ¦™ LlamaIndex Adapter

npm install @cozo-memory/llamaindex @cozo-memory/adapters-core
import { CozoVectorStore } from '@cozo-memory/llamaindex';

const vectorStore = new CozoVectorStore({ useGraphRAG: true });

Documentation: See adapters/README.md for complete examples and API reference.

CLI & TUI

CLI Tool

Full-featured CLI for all operations:

# System operations
cozo-memory system health
cozo-memory system metrics

# Entity operations
cozo-memory entity create -n "MyEntity" -t "person"
cozo-memory entity get -i <entity-id>

# Search
cozo-memory search query -q "search term" -l 10
cozo-memory search agentic -q "agentic query"

# Graph operations
cozo-memory graph pagerank
cozo-memory graph communities

# Export/Import
cozo-memory export json -o backup.json
cozo-memory import file -i data.json -f cozo

# All commands support -f json or -f pretty for output formatting

See CLI help for complete command reference: cozo-memory --help

TUI (Terminal User Interface)

Interactive TUI with mouse support powered by Python Textual:

# Install Python dependencies (one-time)
pip install textual

# Launch TUI
npm run tui
# or directly:
cozo-memory-tui

TUI Features:

  • šŸ–±ļø Full mouse support (click buttons, scroll, select inputs)
  • āŒØļø Keyboard shortcuts (q=quit, h=help, r=refresh)
  • šŸ“Š Interactive menus for all operations
  • šŸŽØ Rich terminal UI with colors and animations

Architecture Overview

graph TB
    Client[MCP Client<br/>Claude Desktop, etc.]
    Server[MCP Server<br/>FastMCP + Zod Schemas]
    Services[Memory Services]
    Embeddings[Embeddings<br/>ONNX Runtime]
    Search[Hybrid Search<br/>RRF Fusion]
    Cache[Semantic Cache<br/>L1 + L2]
    Inference[Inference Engine<br/>Multi-Strategy]
    DB[(CozoDB SQLite<br/>Relations + Validity<br/>HNSW Indices<br/>Datalog/Graph)]
    
    Client -->|stdio| Server
    Server --> Services
    Services --> Embeddings
    Services --> Search
    Services --> Cache
    Services --> Inference
    Services --> DB
    
    style Client fill:#e1f5ff,color:#000
    style Server fill:#fff4e1,color:#000
    style Services fill:#f0e1ff,color:#000
    style DB fill:#e1ffe1,color:#000

See docs/ARCHITECTURE.md for detailed architecture documentation

MCP Tools Overview

The interface is reduced to 5 consolidated tools:

Tool Purpose Key Actions
mutate_memory Write operations create_entity, update_entity, delete_entity, add_observation, create_relation, transactions, sessions, tasks, update_observation, batch_delete, manage_tags, batch
query_memory Read operations search, advancedSearch, context, graph_rag, graph_walking, agentic_search, adaptive_retrieval, list_entities, get_entity_detail, get_session_context, list_sessions
analyze_graph Graph analysis explore, communities, pagerank, betweenness, hits, shortest_path, semantic_walk
manage_system Maintenance health, metrics, stats, export, import, cleanup, defrag, reflect, snapshots
edit_user_profile User preferences Edit global user profile with preferences and work style

See docs/API.md for complete API reference with all parameters and examples

Troubleshooting

Common Issues

First Start Takes Long

  • The embedding model download takes 30-90 seconds on first start (Transformers loads ~500MB of artifacts)
  • This is normal and only happens once
  • Subsequent starts are fast (< 2 seconds)

LLM-powered actions require Ollama

  • The following actions use a local LLM for intelligent processing: cleanup, reflect, summarize_communities, compact, agentic_search
  • Install Ollama from https://ollama.ai
  • Pull the desired model: ollama pull demyagent-4b-i1:Q6_K (or your preferred model)
  • Without Ollama, these actions fall back to non-LLM behavior or return a clear error
  • Core features (CRUD, search, graph, infer) work without any LLM

Windows-Specific

  • Embeddings are processed on CPU for maximum compatibility
  • RocksDB backend requires Visual C++ Redistributable if using that option

Performance Issues

  • First query after restart is slower (cold cache)
  • Use health action to check cache hit rates
  • Consider RocksDB backend for datasets > 100k entities

See docs/BENCHMARKS.md for performance optimization tips

Documentation

Development

Structure

  • src/index.ts: MCP Server + Tool Registration
  • src/memory-service.ts: Core business logic
  • src/db-service.ts: Database operations
  • src/embedding-service.ts: Embedding Pipeline + Cache
  • src/hybrid-search.ts: Search Strategies + RRF
  • src/inference-engine.ts: Inference Strategies
  • src/api_bridge.ts: Express API Bridge (optional)

Scripts

npm run build        # TypeScript Build
npm run dev          # ts-node Start of MCP Server
npm run start        # Starts dist/index.js (stdio)
npm run bridge       # Build + Start of API Bridge
npm run benchmark    # Runs performance tests
npm run eval         # Runs evaluation suite

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

Apache 2.0 - See LICENSE for details.

Acknowledgments

Built with:

Research foundations:

  • GraphRAG-R1 (Yu et al., WWW 2026) - conceptual inspiration for adaptive retrieval
  • HopRAG (ACL 2025) - conceptual inspiration for multi-hop reasoning
  • T-GRAG (Li et al., 2025) - conceptual inspiration for temporal conflict resolution
  • FEEG Framework (Samuel et al., 2026) - conceptual inspiration for query intent classification
  • Allan-Poe (arXiv:2511.00855) - conceptual inspiration for dynamic fusion

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured