mnemo-mcp
Persistent AI memory with SQLite hybrid search (FTS5 + semantic), built-in Qwen3 embedding, and rclone sync across machines.
README
Mnemo MCP Server
mcp-name: io.github.n24q02m/mnemo-mcp
Persistent AI memory with hybrid search and embedded sync. Open, free, unlimited.
Features
- Hybrid search: FTS5 full-text + sqlite-vec semantic + Qwen3-Embedding-0.6B (built-in)
- Zero config mode: Works out of the box — local embedding, no API keys needed
- Auto-detect embedding: Set
API_KEYSfor cloud embedding, auto-fallback to local - Embedded sync: rclone auto-downloaded and managed as subprocess
- Multi-machine: JSONL-based merge sync via rclone (Google Drive, S3, etc.)
- Proactive memory: Tool descriptions guide AI to save preferences, decisions, facts
Quick Start
The recommended way to run this server is via uvx:
uvx mnemo-mcp@latest
Alternatively, you can use
pipx run mnemo-mcp.
Option 1: uvx (Recommended)
{
"mcpServers": {
"mnemo": {
"command": "uvx",
"args": ["mnemo-mcp@latest"],
"env": {
// -- optional: LiteLLM Proxy (production, selfhosted gateway)
// "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
// "LITELLM_PROXY_KEY": "sk-your-virtual-key",
// -- optional: cloud embedding (Gemini > OpenAI > Cohere) for semantic search
// -- without this, uses built-in local Qwen3-Embedding-0.6B (ONNX, CPU)
// -- first run downloads ~570MB model, cached for subsequent runs
"API_KEYS": "GOOGLE_API_KEY:AIza...",
// -- optional: custom embedding endpoint (e.g. modalcom-ai-workers on Modal.com)
// "EMBEDDING_API_BASE": "https://your-worker.modal.run",
// "EMBEDDING_API_KEY": "your-key",
// -- optional: sync memories across machines via rclone
"SYNC_ENABLED": "true", // optional, default: false
"SYNC_REMOTE": "gdrive", // required when SYNC_ENABLED=true
"SYNC_INTERVAL": "300", // optional, auto-sync every 5min (0 = manual only)
"RCLONE_CONFIG_GDRIVE_TYPE": "drive", // required when SYNC_ENABLED=true
"RCLONE_CONFIG_GDRIVE_TOKEN": "<base64>" // required when SYNC_ENABLED=true, from: uvx mnemo-mcp setup-sync drive
}
}
}
}
Option 2: Docker
{
"mcpServers": {
"mnemo": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"--name", "mcp-mnemo",
"-v", "mnemo-data:/data", // persists memories across restarts
"-e", "LITELLM_PROXY_URL", // optional: pass-through from env below
"-e", "LITELLM_PROXY_KEY", // optional: pass-through from env below
"-e", "API_KEYS", // optional: pass-through from env below
"-e", "EMBEDDING_API_BASE", // optional: pass-through from env below
"-e", "EMBEDDING_API_KEY", // optional: pass-through from env below
"-e", "SYNC_ENABLED", // optional: pass-through from env below
"-e", "SYNC_REMOTE", // required when SYNC_ENABLED=true: pass-through
"-e", "SYNC_INTERVAL", // optional: pass-through from env below
"-e", "RCLONE_CONFIG_GDRIVE_TYPE", // required when SYNC_ENABLED=true: pass-through
"-e", "RCLONE_CONFIG_GDRIVE_TOKEN", // required when SYNC_ENABLED=true: pass-through
"n24q02m/mnemo-mcp:latest"
],
"env": {
// -- optional: LiteLLM Proxy (production, selfhosted gateway)
// "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
// "LITELLM_PROXY_KEY": "sk-your-virtual-key",
// -- optional: cloud embedding (Gemini > OpenAI > Cohere) for semantic search
// -- without this, uses built-in local Qwen3-Embedding-0.6B (ONNX, CPU)
"API_KEYS": "GOOGLE_API_KEY:AIza...",
// -- optional: custom embedding endpoint (e.g. modalcom-ai-workers on Modal.com)
// "EMBEDDING_API_BASE": "https://your-worker.modal.run",
// "EMBEDDING_API_KEY": "your-key",
// -- optional: sync memories across machines via rclone
"SYNC_ENABLED": "true", // optional, default: false
"SYNC_REMOTE": "gdrive", // required when SYNC_ENABLED=true
"SYNC_INTERVAL": "300", // optional, auto-sync every 5min (0 = manual only)
"RCLONE_CONFIG_GDRIVE_TYPE": "drive", // required when SYNC_ENABLED=true
"RCLONE_CONFIG_GDRIVE_TOKEN": "<base64>" // required when SYNC_ENABLED=true, from: uvx mnemo-mcp setup-sync drive
}
}
}
}
Pre-install (optional)
Pre-download dependencies before adding to your MCP client config. This avoids slow first-run startup:
# Pre-download embedding model (~570MB) and validate API keys
uvx mnemo-mcp warmup
# With cloud embedding (validates API key, skips local download if cloud works)
API_KEYS="GOOGLE_API_KEY:AIza..." uvx mnemo-mcp warmup
Sync setup (one-time)
# Google Drive
uvx mnemo-mcp setup-sync drive
# Other providers (any rclone remote type)
uvx mnemo-mcp setup-sync dropbox
uvx mnemo-mcp setup-sync onedrive
uvx mnemo-mcp setup-sync s3
Opens a browser for OAuth and outputs env vars (RCLONE_CONFIG_*) to set. Both raw JSON and base64 tokens are supported.
Configuration
| Variable | Default | Description |
|---|---|---|
DB_PATH |
~/.mnemo-mcp/memories.db |
Database location |
LITELLM_PROXY_URL |
— | LiteLLM Proxy URL (e.g. http://10.0.0.20:4000). Enables proxy mode |
LITELLM_PROXY_KEY |
— | LiteLLM Proxy virtual key (e.g. sk-...) |
API_KEYS |
— | API keys (ENV:key,ENV:key). Optional: enables semantic search (SDK mode) |
EMBEDDING_API_BASE |
— | Custom embedding endpoint URL (optional, for SDK mode) |
EMBEDDING_API_KEY |
— | Custom embedding endpoint key (optional) |
EMBEDDING_BACKEND |
(auto-detect) | litellm (cloud API) or local (Qwen3). Auto: API_KEYS -> litellm, else local (always available) |
EMBEDDING_MODEL |
auto-detect | LiteLLM model name (optional) |
EMBEDDING_DIMS |
0 (auto=768) |
Embedding dimensions (0 = auto-detect, default 768) |
SYNC_ENABLED |
false |
Enable rclone sync |
SYNC_REMOTE |
— | rclone remote name (required when sync enabled) |
SYNC_FOLDER |
mnemo-mcp |
Remote folder (optional) |
SYNC_INTERVAL |
0 |
Auto-sync seconds (optional, 0=manual) |
LOG_LEVEL |
INFO |
Log level (optional) |
Embedding (3-Mode Architecture)
Embedding is always available — a local model is built-in and requires no configuration.
Embedding access supports 3 modes, resolved by priority:
| Priority | Mode | Config | Use case |
|---|---|---|---|
| 1 | Proxy | LITELLM_PROXY_URL + LITELLM_PROXY_KEY |
Production (OCI VM, selfhosted gateway) |
| 2 | SDK | API_KEYS or EMBEDDING_API_BASE |
Dev/local with direct API access |
| 3 | Local | Nothing needed | Offline, always available as fallback |
No cross-mode fallback — if proxy is configured but unreachable, calls fail (no silent fallback to direct API).
- Local mode: Qwen3-Embedding-0.6B, always available with zero config.
- GPU auto-detection: If GPU is available (CUDA/DirectML) and
llama-cpp-pythonis installed, automatically uses GGUF model (~480MB) instead of ONNX (~570MB) for better performance. - All embeddings stored at 768 dims (default). Switching providers never breaks the vector table.
- Override with
EMBEDDING_BACKEND=localto force local even with API keys.
API_KEYS supports multiple providers in a single string:
API_KEYS=GOOGLE_API_KEY:AIza...,OPENAI_API_KEY:sk-...,COHERE_API_KEY:co-...
Cloud embedding providers (auto-detected from API_KEYS, priority order):
| Priority | Env Var (LiteLLM) | Model | Native Dims | Stored |
|---|---|---|---|---|
| 1 | GEMINI_API_KEY |
gemini/gemini-embedding-001 |
3072 | 768 |
| 2 | OPENAI_API_KEY |
text-embedding-3-large |
3072 | 768 |
| 3 | COHERE_API_KEY |
embed-multilingual-v3.0 |
1024 | 768 |
All embeddings are truncated to 768 dims (default) for storage. This ensures switching models never breaks the vector table. Override with EMBEDDING_DIMS if needed.
API_KEYS format maps your env var to LiteLLM's expected var (e.g., GOOGLE_API_KEY:key auto-sets GEMINI_API_KEY). Set EMBEDDING_MODEL explicitly for other providers.
MCP Tools
memory — Core memory operations
| Action | Required | Optional |
|---|---|---|
add |
content |
category, tags |
search |
query |
category, tags, limit |
list |
— | category, limit |
update |
memory_id |
content, category, tags |
delete |
memory_id |
— |
export |
— | — |
import |
data (JSONL) |
mode (merge/replace) |
stats |
— | — |
config — Server configuration
| Action | Required | Optional |
|---|---|---|
status |
— | — |
sync |
— | — |
set |
key, value |
— |
help — Full documentation
help(topic="memory") # or "config"
MCP Resources
| URI | Description |
|---|---|
mnemo://stats |
Database statistics and server status |
mnemo://recent |
10 most recently updated memories |
MCP Prompts
| Prompt | Parameters | Description |
|---|---|---|
save_summary |
summary |
Generate prompt to save a conversation summary as memory |
recall_context |
topic |
Generate prompt to recall relevant memories about a topic |
Architecture
MCP Client (Claude, Cursor, etc.)
|
FastMCP Server
/ | \
memory config help
| | |
MemoryDB Settings docs/
/ \
FTS5 sqlite-vec
|
EmbeddingBackend
/ \
LiteLLM Qwen3 ONNX
| (local CPU)
Gemini / OpenAI / Cohere
Sync: rclone (embedded) -> Google Drive / S3 / ...
Development
# Install
uv sync
# Run
uv run mnemo-mcp
# Lint
uv run ruff check src/
uv run ty check src/
# Test
uv run pytest
Compatible With
Also by n24q02m
| Server | Description | Install |
|---|---|---|
| better-notion-mcp | Notion API for AI agents | npx -y @n24q02m/better-notion-mcp@latest |
| wet-mcp | Web search, content extraction, library docs | uvx --python 3.13 wet-mcp@latest |
| better-email-mcp | Email (IMAP/SMTP) for AI agents | npx -y @n24q02m/better-email-mcp@latest |
| better-godot-mcp | Godot Engine for AI agents | npx -y @n24q02m/better-godot-mcp@latest |
Related Projects
- modalcom-ai-workers — GPU-accelerated AI workers on Modal.com (embedding, reranking)
- qwen3-embed — Local embedding/reranking library used by mnemo-mcp
Contributing
See CONTRIBUTING.md
License
MIT - See LICENSE
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.