flux7-memory
Governed multi-agent memory for AI agents. Hybrid markdown + SQLite store with full-text search, vector retrieval, and LLM reranking. Three transports: MCP stdio, HTTP JSON-RPC, and MCP SSE. One Go binary
README
______ ____
/ __/ //_ /_____ _ ___ __ _
/ _// /__/ /___/ ' \/ -_) ' \
/_/ /____/_/ /_/_/_/\__/_/_/_/
flux7-memory
A lightweight MCP server in Go for shared memory across AI agents. Single binary, zero cgo, usable standalone over stdio or as a shared daemon behind flux7-mesh. Hybrid markdown + SQLite store with full-text search, optional dense-vector hybrid retrieval, LLM reranking, and three transports: MCP stdio, HTTP JSON-RPC, and MCP SSE. Comes with a Python SDK for provider-agnostic integration.
Features
- 7 MCP tools —
memory_store,memory_recall,memory_search,memory_context,memory_get,memory_list,memory_forget - Hybrid storage — append-only markdown workspace as source of truth, SQLite (FTS5) as a rebuildable index
- Field-weighted BM25 — FTS5 ranking with tuned weights: object content (5x), entity key (2x), tags (0.5x)
- Hybrid search (opt-in) — BM25 + dense cosine similarity merged via Reciprocal Rank Fusion (RRF). Requires an external embedding provider (Ollama or any OpenAI-compatible API)
- LLM reranking (opt-in) — post-RRF listwise reranking via Ollama, with graceful degradation if the reranker is unavailable
- Natural language mode —
mode="natural"strips stop words, applies wildcard stemming, and OR-joins tokens so agents can query in plain language instead of FTS5 syntax - Neighbor inclusion —
include_neighbors=trueautomatically fetches sequential neighbors (e.g.t004,t006aroundt005) to capture context spread across consecutive entries - Access tracking —
access_countandlast_accessedare bumped onmemory_recall, providing usage signals without creating feedback loops - Three transports — MCP stdio (default, for Claude Code / Cursor), HTTP JSON-RPC via
mem7 serve(for SDKs and direct API calls), and MCP SSE viaGET /sse(for flux7-mesh daemon mode — one process, shared DB) - Snapshot reminder —
POST /memory/snapshot_reminder(and the matching MCP method) lets an agent runtime inject a pre-compaction instruction into its context - Rebuildable index —
mem7 rescandrops the SQLite index and replays the markdown workspace to restore consistency - Tag filters, agent tracking, TTL
Quick start
go install github.com/KTCrisis/flux7-memory/cmd/mem7@latest
Or build from source :
cd flux7-memory
go build -o ~/go/bin/mem7 ./cmd/mem7
Default stdio mode (MCP client spawns the binary) :
~/go/bin/mem7
If a mem7 serve daemon is already running, stdio mode auto-detects it and becomes a thin proxy (stdin↔HTTP) instead of opening a second local store. Same command, zero config change.
Daemon mode (shared across multiple clients via HTTP + SSE) :
MEM7_TOKEN=mem7_secret123 ~/go/bin/mem7 serve --listen :9070
Exposes /rpc (HTTP JSON-RPC), /sse + /messages (MCP SSE transport), /healthz, and /memory/snapshot_reminder. flux7-mesh connects via SSE for MCP tool calls and via /rpc for decision writes — one daemon, one database.
Rebuild the SQLite index from the markdown workspace :
~/go/bin/mem7 rescan
Drop TTL-expired entries from the index (the markdown workspace is left untouched ; rescan re-evaluates TTL on replay) :
~/go/bin/mem7 prune
Configuration
| Variable | Default | Description |
|---|---|---|
MEM7_DIR |
~/.mem7 |
Data directory (hosts workspace/ and index.db) |
MEM7_LISTEN |
:9070 |
HTTP bind address when in serve mode |
MEM7_TOKEN |
(empty) | Bearer token required on /rpc and /memory/* when set |
MEM7_MAX_ENTRIES |
10000 |
Soft ceiling on live entries |
MEM7_EMBED_URL |
(empty) | Base URL of the embedding provider. Setting this enables hybrid search |
MEM7_EMBED_MODEL |
nomic-embed-text |
Model name passed to the embedding API |
MEM7_EMBED_PROVIDER |
ollama |
Provider format: ollama (POST /api/embed) or openai (POST /v1/embeddings) |
MEM7_EMBED_KEY |
(empty) | Bearer token for the embedding API (required for OpenAI, optional for Ollama) |
MEM7_RERANK_URL |
(empty) | Base URL of the reranking LLM. Setting this enables LLM reranking after RRF merge |
MEM7_RERANK_MODEL |
gemma4:e4b |
Model name passed to the Ollama generate API for reranking |
Flags on mem7 serve mirror MEM7_LISTEN and MEM7_TOKEN : --listen :9070 --token mem7_....
Hybrid search setup
Hybrid search is entirely opt-in. Without MEM7_EMBED_URL, mem7 uses pure BM25.
With local Ollama :
MEM7_EMBED_URL=http://localhost:11434 \
MEM7_EMBED_MODEL=nomic-embed-text \
~/go/bin/mem7
With OpenAI API :
MEM7_EMBED_URL=https://api.openai.com \
MEM7_EMBED_MODEL=text-embedding-3-small \
MEM7_EMBED_PROVIDER=openai \
MEM7_EMBED_KEY=sk-... \
~/go/bin/mem7
With any OpenAI-compatible endpoint (vLLM, LiteLLM, Azure OpenAI, etc.) :
MEM7_EMBED_URL=http://localhost:8000 \
MEM7_EMBED_MODEL=BAAI/bge-small-en-v1.5 \
MEM7_EMBED_PROVIDER=openai \
~/go/bin/mem7
When enabled, memory_store computes and persists an embedding alongside each entry. memory_search retrieves BM25 top-2N and cosine top-2N candidates, then merges them via Reciprocal Rank Fusion (RRF, k=60) into the final top-N. Embeddings are stored as BLOBs in SQLite and cached in memory for sub-ms cosine search.
LLM reranking setup
LLM reranking is opt-in on top of hybrid search. It over-fetches 3x candidates, merges via RRF, then uses an LLM to score relevance before returning the final top-N. Falls back to non-reranked results if the LLM is unavailable.
MEM7_EMBED_URL=http://localhost:11434 \
MEM7_RERANK_URL=http://localhost:11434 \
MEM7_RERANK_MODEL=gemma4:e4b \
~/go/bin/mem7
Python SDK
A provider-agnostic Python client for mem7, wrapping all MCP tools via JSON-RPC over HTTP.
Install
pip install flux7-memory
Or from source :
pip install ./sdk/python
Usage
from mem7 import Mem7
m = Mem7("http://localhost:9070", token="my-token")
# Store a memory
m.store("user.prefs", "prefers dark mode", tags=["user"])
# Search (returns formatted text)
print(m.search("dark mode", limit=5))
# Context (returns structured Memory objects)
for mem in m.context("dark mode", limit=5):
print(f"{mem.key}: {mem.value}")
# Formatted block for LLM prompt injection
block = m.context_block("user preferences", limit=10)
# Other tools
m.recall(key="user.prefs")
m.list(tags=["user"])
m.get("memory/2026-05-07.md")
m.forget(key="user.prefs")
Workspace layout
~/.mem7/
├── workspace/
│ ├── MEMORY.md # reserved for long-term notes
│ └── memory/
│ ├── 2026-04-11.md # append-only daily logs
│ └── 2026-04-12.md
└── index.db # SQLite (facts + facts_fts + embeddings)
The markdown files are the source of truth ; index.db is a derived cache that can be dropped and rebuilt from the markdown at any time via mem7 rescan.
Each entry is written as a level-2 heading followed by a fenced mem7 envelope (plain key/value metadata) and a free-form body, terminated by a horizontal rule. A human can edit these files in place — the next rescan picks up the changes.
Example :
## example_key
```mem7
op: store
agent: claude
tags: demo, example
created: 2026-04-11T20:00:00Z
updated: 2026-04-11T20:00:00Z
```
Free-form markdown content lives here.
---
Usage with flux7-mesh
In your config.yaml :
mcp_servers:
- name: memory
transport: stdio
command: /home/user/go/bin/mem7
env:
MEM7_DIR: /home/user/.mem7
flux7-mesh discovers the tools via tools/list ; no per-tool wiring is required. Grants and policies apply as usual.
To share the same memory across several machines behind flux7-mesh, run mem7 serve on one host and point the other hosts at it via the upcoming remote-client mode (Phase 1.5 of the roadmap).
Tools
memory_store
Upsert a memory entry by key. The markdown workspace receives an append-only section ; the SQLite index is updated in place. If hybrid search is enabled, an embedding is computed and stored alongside the entry.
| Parameter | Type | Required | Description |
|---|---|---|---|
key |
string | yes | Unique key for this memory |
value |
string | yes | Content to remember (free-form markdown allowed) |
tags |
string[] | no | Tags for filtering and grouping |
agent |
string | no | Identifier of the storing agent |
ttl |
number | no | Time-to-live in seconds (0 = permanent) |
memory_recall
Recall memories by key, tags, or agent, most recently updated first. Bumps access_count and last_accessed on returned entries.
| Parameter | Type | Required | Description |
|---|---|---|---|
key |
string | no | Exact key to recall |
tags |
string[] | no | Filter by tags (AND logic) |
agent |
string | no | Filter by agent |
limit |
number | no | Max results (default 10) |
memory_search
Full-text search over memories using SQLite FTS5, ranked by field-weighted BM25. When hybrid search is enabled, results are merged with dense cosine similarity via RRF. Supports FTS5 operators in raw mode : foo* prefix, AND / OR / NOT, quoted phrases.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | yes | Search query |
mode |
string | no | raw (default, FTS5 syntax) or natural (plain language, auto-stemmed) |
tags |
string[] | no | Post-filter by tags |
agent |
string | no | Post-filter by agent |
since |
string | no | Lower bound on updated_at (RFC3339) |
until |
string | no | Upper bound on updated_at (RFC3339) |
limit |
number | no | Max results (default 10) |
include_neighbors |
boolean | no | Fetch sequential neighbors around matching entries (default false) |
neighbor_radius |
number | no | How many neighbors to fetch on each side (default 1) |
memory_context
Same search capabilities as memory_search but returns a JSON array of structured objects instead of formatted markdown. Designed for programmatic use by agent SDKs.
| Parameter | Type | Required | Description |
|---|---|---|---|
query |
string | yes | Search query |
mode |
string | no | raw (default) or natural |
tags |
string[] | no | Post-filter by tags |
agent |
string | no | Post-filter by agent |
since |
string | no | Lower bound on updated_at (RFC3339) |
until |
string | no | Upper bound on updated_at (RFC3339) |
limit |
number | no | Max results (default 10) |
include_neighbors |
boolean | no | Fetch sequential neighbors (default false) |
neighbor_radius |
number | no | Neighbors on each side (default 1) |
Returns a JSON array of { "key", "value", "tags", "agent", "updated" } objects.
memory_get
Read a file from the markdown workspace, optionally between from_line and to_line (1-indexed, inclusive). Paths are resolved relative to the workspace root and refused if they escape it.
| Parameter | Type | Required | Description |
|---|---|---|---|
path |
string | yes | Workspace-relative path (e.g. memory/2026-04-11.md) |
from_line |
number | no | First line to read |
to_line |
number | no | Last line to read |
memory_list
List memory keys with metadata (without values).
| Parameter | Type | Required | Description |
|---|---|---|---|
tags |
string[] | no | Filter by tags |
agent |
string | no | Filter by agent |
memory_forget
Delete memories by key and/or tags. A tombstone section is appended to the markdown workspace, and the SQLite index soft-deletes the matching rows.
| Parameter | Type | Required | Description |
|---|---|---|---|
key |
string | no | Exact key to delete |
tags |
string[] | no | Delete all entries matching these tags (AND logic) |
agent |
string | no | Recorded on the tombstone |
HTTP endpoints
mem7 serve exposes these routes :
| Method | Path | Description |
|---|---|---|
GET |
/healthz |
Liveness probe (always public, no auth) |
POST |
/rpc |
JSON-RPC 2.0 endpoint — same MCP tool surface as stdio |
POST |
/memory/snapshot_reminder |
Returns a structured instructional payload for an agent runtime to inject into its context before compaction |
Bearer auth is applied to /rpc and /memory/* when MEM7_TOKEN (or --token) is set.
Example :
curl -s -X POST http://localhost:9070/rpc \
-H "Authorization: Bearer $MEM7_TOKEN" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call",
"params":{"name":"memory_search","arguments":{"query":"roadmap*"}}}'
Architecture
Claude Code / flux7-mesh / Python SDK / scripts
│
MCP stdio ┴ HTTP JSON-RPC
│
┌─────▼─────┐
│ Dispatcher │ ← MCP protocol layer
└─────┬─────┘
│
┌─────▼─────┐
│ Store │ ← orchestrator
└──┬──┬──┬──┬┘
│ │ │ │
┌──────▼┐ │ ┌▼──────────┐ ┌▼─────────┐
│markdown│ │ │ sqlite │ │ reranker │
│workspace│ │ │ (facts + │ │ (Ollama) │
│(truth) │ │ │ FTS5 + │ │ opt-in │
└────────┘ │ │ embeds) │ └───────────┘
│ └───────────┘
┌──────▼──────┐
│ embedder │ ← opt-in, external
│ (Ollama / │
│ OpenAI) │
└─────────────┘
Every write goes through the markdown writer first and then updates the SQLite index. If hybrid search is enabled, an embedding is computed via the external provider and stored as a BLOB. Reads consult the index only ; embeddings are cached in memory for sub-ms cosine search. If the index is corrupted or out of sync, mem7 rescan drops it and replays the markdown chronologically to reconstruct a consistent state.
License
Apache 2.0
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.