mnemo-mcp

mnemo-mcp

Persistent AI memory with SQLite hybrid search (FTS5 + semantic), built-in Qwen3 embedding, and rclone sync across machines.

Category
Visit Server

README

Mnemo MCP Server

mcp-name: io.github.n24q02m/mnemo-mcp

Persistent AI memory with hybrid search and embedded sync. Open, free, unlimited.

CI codecov PyPI Docker License: MIT

Python SQLite MCP semantic-release Renovate

Features

  • Hybrid search: FTS5 full-text + sqlite-vec semantic + Qwen3-Embedding-0.6B (built-in)
  • Zero config mode: Works out of the box — local embedding, no API keys needed
  • Auto-detect embedding: Set API_KEYS for cloud embedding, auto-fallback to local
  • Embedded sync: rclone auto-downloaded and managed as subprocess
  • Multi-machine: JSONL-based merge sync via rclone (Google Drive, S3, etc.)
  • Proactive memory: Tool descriptions guide AI to save preferences, decisions, facts

Quick Start

The recommended way to run this server is via uvx:

uvx mnemo-mcp@latest

Alternatively, you can use pipx run mnemo-mcp.

Option 1: uvx (Recommended)

{
  "mcpServers": {
    "mnemo": {
      "command": "uvx",
      "args": ["mnemo-mcp@latest"],
      "env": {
        // -- optional: LiteLLM Proxy (production, selfhosted gateway)
        // "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
        // "LITELLM_PROXY_KEY": "sk-your-virtual-key",
        // -- optional: cloud embedding (Gemini > OpenAI > Cohere) for semantic search
        // -- without this, uses built-in local Qwen3-Embedding-0.6B (ONNX, CPU)
        // -- first run downloads ~570MB model, cached for subsequent runs
        "API_KEYS": "GOOGLE_API_KEY:AIza...",
        // -- optional: custom embedding endpoint (e.g. modalcom-ai-workers on Modal.com)
        // "EMBEDDING_API_BASE": "https://your-worker.modal.run",
        // "EMBEDDING_API_KEY": "your-key",
        // -- optional: sync memories across machines via rclone
        "SYNC_ENABLED": "true",                    // optional, default: false
        "SYNC_REMOTE": "gdrive",                   // required when SYNC_ENABLED=true
        "SYNC_INTERVAL": "300",                    // optional, auto-sync every 5min (0 = manual only)
        "RCLONE_CONFIG_GDRIVE_TYPE": "drive",      // required when SYNC_ENABLED=true
        "RCLONE_CONFIG_GDRIVE_TOKEN": "<base64>"   // required when SYNC_ENABLED=true, from: uvx mnemo-mcp setup-sync drive
      }
    }
  }
}

Option 2: Docker

{
  "mcpServers": {
    "mnemo": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "--name", "mcp-mnemo",
        "-v", "mnemo-data:/data",                  // persists memories across restarts
        "-e", "LITELLM_PROXY_URL",                 // optional: pass-through from env below
        "-e", "LITELLM_PROXY_KEY",                 // optional: pass-through from env below
        "-e", "API_KEYS",                          // optional: pass-through from env below
        "-e", "EMBEDDING_API_BASE",                // optional: pass-through from env below
        "-e", "EMBEDDING_API_KEY",                 // optional: pass-through from env below
        "-e", "SYNC_ENABLED",                      // optional: pass-through from env below
        "-e", "SYNC_REMOTE",                       // required when SYNC_ENABLED=true: pass-through
        "-e", "SYNC_INTERVAL",                     // optional: pass-through from env below
        "-e", "RCLONE_CONFIG_GDRIVE_TYPE",         // required when SYNC_ENABLED=true: pass-through
        "-e", "RCLONE_CONFIG_GDRIVE_TOKEN",        // required when SYNC_ENABLED=true: pass-through
        "n24q02m/mnemo-mcp:latest"
      ],
      "env": {
        // -- optional: LiteLLM Proxy (production, selfhosted gateway)
        // "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
        // "LITELLM_PROXY_KEY": "sk-your-virtual-key",
        // -- optional: cloud embedding (Gemini > OpenAI > Cohere) for semantic search
        // -- without this, uses built-in local Qwen3-Embedding-0.6B (ONNX, CPU)
        "API_KEYS": "GOOGLE_API_KEY:AIza...",
        // -- optional: custom embedding endpoint (e.g. modalcom-ai-workers on Modal.com)
        // "EMBEDDING_API_BASE": "https://your-worker.modal.run",
        // "EMBEDDING_API_KEY": "your-key",
        // -- optional: sync memories across machines via rclone
        "SYNC_ENABLED": "true",                    // optional, default: false
        "SYNC_REMOTE": "gdrive",                   // required when SYNC_ENABLED=true
        "SYNC_INTERVAL": "300",                    // optional, auto-sync every 5min (0 = manual only)
        "RCLONE_CONFIG_GDRIVE_TYPE": "drive",      // required when SYNC_ENABLED=true
        "RCLONE_CONFIG_GDRIVE_TOKEN": "<base64>"   // required when SYNC_ENABLED=true, from: uvx mnemo-mcp setup-sync drive
      }
    }
  }
}

Pre-install (optional)

Pre-download dependencies before adding to your MCP client config. This avoids slow first-run startup:

# Pre-download embedding model (~570MB) and validate API keys
uvx mnemo-mcp warmup

# With cloud embedding (validates API key, skips local download if cloud works)
API_KEYS="GOOGLE_API_KEY:AIza..." uvx mnemo-mcp warmup

Sync setup (one-time)

# Google Drive
uvx mnemo-mcp setup-sync drive

# Other providers (any rclone remote type)
uvx mnemo-mcp setup-sync dropbox
uvx mnemo-mcp setup-sync onedrive
uvx mnemo-mcp setup-sync s3

Opens a browser for OAuth and outputs env vars (RCLONE_CONFIG_*) to set. Both raw JSON and base64 tokens are supported.

Configuration

Variable Default Description
DB_PATH ~/.mnemo-mcp/memories.db Database location
LITELLM_PROXY_URL LiteLLM Proxy URL (e.g. http://10.0.0.20:4000). Enables proxy mode
LITELLM_PROXY_KEY LiteLLM Proxy virtual key (e.g. sk-...)
API_KEYS API keys (ENV:key,ENV:key). Optional: enables semantic search (SDK mode)
EMBEDDING_API_BASE Custom embedding endpoint URL (optional, for SDK mode)
EMBEDDING_API_KEY Custom embedding endpoint key (optional)
EMBEDDING_BACKEND (auto-detect) litellm (cloud API) or local (Qwen3). Auto: API_KEYS -> litellm, else local (always available)
EMBEDDING_MODEL auto-detect LiteLLM model name (optional)
EMBEDDING_DIMS 0 (auto=768) Embedding dimensions (0 = auto-detect, default 768)
SYNC_ENABLED false Enable rclone sync
SYNC_REMOTE rclone remote name (required when sync enabled)
SYNC_FOLDER mnemo-mcp Remote folder (optional)
SYNC_INTERVAL 0 Auto-sync seconds (optional, 0=manual)
LOG_LEVEL INFO Log level (optional)

Embedding (3-Mode Architecture)

Embedding is always available — a local model is built-in and requires no configuration.

Embedding access supports 3 modes, resolved by priority:

Priority Mode Config Use case
1 Proxy LITELLM_PROXY_URL + LITELLM_PROXY_KEY Production (OCI VM, selfhosted gateway)
2 SDK API_KEYS or EMBEDDING_API_BASE Dev/local with direct API access
3 Local Nothing needed Offline, always available as fallback

No cross-mode fallback — if proxy is configured but unreachable, calls fail (no silent fallback to direct API).

  • Local mode: Qwen3-Embedding-0.6B, always available with zero config.
  • GPU auto-detection: If GPU is available (CUDA/DirectML) and llama-cpp-python is installed, automatically uses GGUF model (~480MB) instead of ONNX (~570MB) for better performance.
  • All embeddings stored at 768 dims (default). Switching providers never breaks the vector table.
  • Override with EMBEDDING_BACKEND=local to force local even with API keys.

API_KEYS supports multiple providers in a single string:

API_KEYS=GOOGLE_API_KEY:AIza...,OPENAI_API_KEY:sk-...,COHERE_API_KEY:co-...

Cloud embedding providers (auto-detected from API_KEYS, priority order):

Priority Env Var (LiteLLM) Model Native Dims Stored
1 GEMINI_API_KEY gemini/gemini-embedding-001 3072 768
2 OPENAI_API_KEY text-embedding-3-large 3072 768
3 COHERE_API_KEY embed-multilingual-v3.0 1024 768

All embeddings are truncated to 768 dims (default) for storage. This ensures switching models never breaks the vector table. Override with EMBEDDING_DIMS if needed.

API_KEYS format maps your env var to LiteLLM's expected var (e.g., GOOGLE_API_KEY:key auto-sets GEMINI_API_KEY). Set EMBEDDING_MODEL explicitly for other providers.

MCP Tools

memory — Core memory operations

Action Required Optional
add content category, tags
search query category, tags, limit
list category, limit
update memory_id content, category, tags
delete memory_id
export
import data (JSONL) mode (merge/replace)
stats

config — Server configuration

Action Required Optional
status
sync
set key, value

help — Full documentation

help(topic="memory")  # or "config"

MCP Resources

URI Description
mnemo://stats Database statistics and server status
mnemo://recent 10 most recently updated memories

MCP Prompts

Prompt Parameters Description
save_summary summary Generate prompt to save a conversation summary as memory
recall_context topic Generate prompt to recall relevant memories about a topic

Architecture

                  MCP Client (Claude, Cursor, etc.)
                         |
                    FastMCP Server
                   /      |       \
             memory    config    help
                |         |        |
            MemoryDB   Settings  docs/
            /     \
        FTS5    sqlite-vec
                    |
              EmbeddingBackend
              /            \
         LiteLLM        Qwen3 ONNX
            |           (local CPU)
  Gemini / OpenAI / Cohere

        Sync: rclone (embedded) -> Google Drive / S3 / ...

Development

# Install
uv sync

# Run
uv run mnemo-mcp

# Lint
uv run ruff check src/
uv run ty check src/

# Test
uv run pytest

Compatible With

Claude Desktop Claude Code Cursor VS Code Copilot Antigravity Gemini CLI OpenAI Codex OpenCode

Also by n24q02m

Server Description Install
better-notion-mcp Notion API for AI agents npx -y @n24q02m/better-notion-mcp@latest
wet-mcp Web search, content extraction, library docs uvx --python 3.13 wet-mcp@latest
better-email-mcp Email (IMAP/SMTP) for AI agents npx -y @n24q02m/better-email-mcp@latest
better-godot-mcp Godot Engine for AI agents npx -y @n24q02m/better-godot-mcp@latest

Related Projects

  • modalcom-ai-workers — GPU-accelerated AI workers on Modal.com (embedding, reranking)
  • qwen3-embed — Local embedding/reranking library used by mnemo-mcp

Contributing

See CONTRIBUTING.md

License

MIT - See LICENSE

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured