flux7-memory

flux7-memory

Governed multi-agent memory for AI agents. Hybrid markdown + SQLite store with full-text search, vector retrieval, and LLM reranking. Three transports: MCP stdio, HTTP JSON-RPC, and MCP SSE. One Go binary

Category
Visit Server

README

   ______ ____                  
  / __/ //_  /_____ _  ___ __ _ 
 / _// /__/ /___/  ' \/ -_)  ' \
/_/ /____/_/   /_/_/_/\__/_/_/_/

flux7-memory

GitHub release Go License

A lightweight MCP server in Go for shared memory across AI agents. Single binary, zero cgo, usable standalone over stdio or as a shared daemon behind flux7-mesh. Hybrid markdown + SQLite store with full-text search, optional dense-vector hybrid retrieval, LLM reranking, and three transports: MCP stdio, HTTP JSON-RPC, and MCP SSE. Comes with a Python SDK for provider-agnostic integration.

Features

  • 7 MCP toolsmemory_store, memory_recall, memory_search, memory_context, memory_get, memory_list, memory_forget
  • Hybrid storage — append-only markdown workspace as source of truth, SQLite (FTS5) as a rebuildable index
  • Field-weighted BM25 — FTS5 ranking with tuned weights: object content (5x), entity key (2x), tags (0.5x)
  • Hybrid search (opt-in) — BM25 + dense cosine similarity merged via Reciprocal Rank Fusion (RRF). Requires an external embedding provider (Ollama or any OpenAI-compatible API)
  • LLM reranking (opt-in) — post-RRF listwise reranking via Ollama, with graceful degradation if the reranker is unavailable
  • Natural language modemode="natural" strips stop words, applies wildcard stemming, and OR-joins tokens so agents can query in plain language instead of FTS5 syntax
  • Neighbor inclusioninclude_neighbors=true automatically fetches sequential neighbors (e.g. t004, t006 around t005) to capture context spread across consecutive entries
  • Access trackingaccess_count and last_accessed are bumped on memory_recall, providing usage signals without creating feedback loops
  • Three transports — MCP stdio (default, for Claude Code / Cursor), HTTP JSON-RPC via mem7 serve (for SDKs and direct API calls), and MCP SSE via GET /sse (for flux7-mesh daemon mode — one process, shared DB)
  • Snapshot reminderPOST /memory/snapshot_reminder (and the matching MCP method) lets an agent runtime inject a pre-compaction instruction into its context
  • Rebuildable indexmem7 rescan drops the SQLite index and replays the markdown workspace to restore consistency
  • Tag filters, agent tracking, TTL

Quick start

go install github.com/KTCrisis/flux7-memory/cmd/mem7@latest

Or build from source :

cd flux7-memory
go build -o ~/go/bin/mem7 ./cmd/mem7

Default stdio mode (MCP client spawns the binary) :

~/go/bin/mem7

If a mem7 serve daemon is already running, stdio mode auto-detects it and becomes a thin proxy (stdin↔HTTP) instead of opening a second local store. Same command, zero config change.

Daemon mode (shared across multiple clients via HTTP + SSE) :

MEM7_TOKEN=mem7_secret123 ~/go/bin/mem7 serve --listen :9070

Exposes /rpc (HTTP JSON-RPC), /sse + /messages (MCP SSE transport), /healthz, and /memory/snapshot_reminder. flux7-mesh connects via SSE for MCP tool calls and via /rpc for decision writes — one daemon, one database.

Rebuild the SQLite index from the markdown workspace :

~/go/bin/mem7 rescan

Drop TTL-expired entries from the index (the markdown workspace is left untouched ; rescan re-evaluates TTL on replay) :

~/go/bin/mem7 prune

Configuration

Variable Default Description
MEM7_DIR ~/.mem7 Data directory (hosts workspace/ and index.db)
MEM7_LISTEN :9070 HTTP bind address when in serve mode
MEM7_TOKEN (empty) Bearer token required on /rpc and /memory/* when set
MEM7_MAX_ENTRIES 10000 Soft ceiling on live entries
MEM7_EMBED_URL (empty) Base URL of the embedding provider. Setting this enables hybrid search
MEM7_EMBED_MODEL nomic-embed-text Model name passed to the embedding API
MEM7_EMBED_PROVIDER ollama Provider format: ollama (POST /api/embed) or openai (POST /v1/embeddings)
MEM7_EMBED_KEY (empty) Bearer token for the embedding API (required for OpenAI, optional for Ollama)
MEM7_RERANK_URL (empty) Base URL of the reranking LLM. Setting this enables LLM reranking after RRF merge
MEM7_RERANK_MODEL gemma4:e4b Model name passed to the Ollama generate API for reranking

Flags on mem7 serve mirror MEM7_LISTEN and MEM7_TOKEN : --listen :9070 --token mem7_....

Hybrid search setup

Hybrid search is entirely opt-in. Without MEM7_EMBED_URL, mem7 uses pure BM25.

With local Ollama :

MEM7_EMBED_URL=http://localhost:11434 \
MEM7_EMBED_MODEL=nomic-embed-text \
  ~/go/bin/mem7

With OpenAI API :

MEM7_EMBED_URL=https://api.openai.com \
MEM7_EMBED_MODEL=text-embedding-3-small \
MEM7_EMBED_PROVIDER=openai \
MEM7_EMBED_KEY=sk-... \
  ~/go/bin/mem7

With any OpenAI-compatible endpoint (vLLM, LiteLLM, Azure OpenAI, etc.) :

MEM7_EMBED_URL=http://localhost:8000 \
MEM7_EMBED_MODEL=BAAI/bge-small-en-v1.5 \
MEM7_EMBED_PROVIDER=openai \
  ~/go/bin/mem7

When enabled, memory_store computes and persists an embedding alongside each entry. memory_search retrieves BM25 top-2N and cosine top-2N candidates, then merges them via Reciprocal Rank Fusion (RRF, k=60) into the final top-N. Embeddings are stored as BLOBs in SQLite and cached in memory for sub-ms cosine search.

LLM reranking setup

LLM reranking is opt-in on top of hybrid search. It over-fetches 3x candidates, merges via RRF, then uses an LLM to score relevance before returning the final top-N. Falls back to non-reranked results if the LLM is unavailable.

MEM7_EMBED_URL=http://localhost:11434 \
MEM7_RERANK_URL=http://localhost:11434 \
MEM7_RERANK_MODEL=gemma4:e4b \
  ~/go/bin/mem7

Python SDK

A provider-agnostic Python client for mem7, wrapping all MCP tools via JSON-RPC over HTTP.

Install

pip install flux7-memory

Or from source :

pip install ./sdk/python

Usage

from mem7 import Mem7

m = Mem7("http://localhost:9070", token="my-token")

# Store a memory
m.store("user.prefs", "prefers dark mode", tags=["user"])

# Search (returns formatted text)
print(m.search("dark mode", limit=5))

# Context (returns structured Memory objects)
for mem in m.context("dark mode", limit=5):
    print(f"{mem.key}: {mem.value}")

# Formatted block for LLM prompt injection
block = m.context_block("user preferences", limit=10)

# Other tools
m.recall(key="user.prefs")
m.list(tags=["user"])
m.get("memory/2026-05-07.md")
m.forget(key="user.prefs")

Workspace layout

~/.mem7/
├── workspace/
│   ├── MEMORY.md                      # reserved for long-term notes
│   └── memory/
│       ├── 2026-04-11.md              # append-only daily logs
│       └── 2026-04-12.md
└── index.db                           # SQLite (facts + facts_fts + embeddings)

The markdown files are the source of truth ; index.db is a derived cache that can be dropped and rebuilt from the markdown at any time via mem7 rescan.

Each entry is written as a level-2 heading followed by a fenced mem7 envelope (plain key/value metadata) and a free-form body, terminated by a horizontal rule. A human can edit these files in place — the next rescan picks up the changes.

Example :

## example_key

```mem7
op: store
agent: claude
tags: demo, example
created: 2026-04-11T20:00:00Z
updated: 2026-04-11T20:00:00Z
```

Free-form markdown content lives here.

---

Usage with flux7-mesh

In your config.yaml :

mcp_servers:
  - name: memory
    transport: stdio
    command: /home/user/go/bin/mem7
    env:
      MEM7_DIR: /home/user/.mem7

flux7-mesh discovers the tools via tools/list ; no per-tool wiring is required. Grants and policies apply as usual.

To share the same memory across several machines behind flux7-mesh, run mem7 serve on one host and point the other hosts at it via the upcoming remote-client mode (Phase 1.5 of the roadmap).

Tools

memory_store

Upsert a memory entry by key. The markdown workspace receives an append-only section ; the SQLite index is updated in place. If hybrid search is enabled, an embedding is computed and stored alongside the entry.

Parameter Type Required Description
key string yes Unique key for this memory
value string yes Content to remember (free-form markdown allowed)
tags string[] no Tags for filtering and grouping
agent string no Identifier of the storing agent
ttl number no Time-to-live in seconds (0 = permanent)

memory_recall

Recall memories by key, tags, or agent, most recently updated first. Bumps access_count and last_accessed on returned entries.

Parameter Type Required Description
key string no Exact key to recall
tags string[] no Filter by tags (AND logic)
agent string no Filter by agent
limit number no Max results (default 10)

memory_search

Full-text search over memories using SQLite FTS5, ranked by field-weighted BM25. When hybrid search is enabled, results are merged with dense cosine similarity via RRF. Supports FTS5 operators in raw mode : foo* prefix, AND / OR / NOT, quoted phrases.

Parameter Type Required Description
query string yes Search query
mode string no raw (default, FTS5 syntax) or natural (plain language, auto-stemmed)
tags string[] no Post-filter by tags
agent string no Post-filter by agent
since string no Lower bound on updated_at (RFC3339)
until string no Upper bound on updated_at (RFC3339)
limit number no Max results (default 10)
include_neighbors boolean no Fetch sequential neighbors around matching entries (default false)
neighbor_radius number no How many neighbors to fetch on each side (default 1)

memory_context

Same search capabilities as memory_search but returns a JSON array of structured objects instead of formatted markdown. Designed for programmatic use by agent SDKs.

Parameter Type Required Description
query string yes Search query
mode string no raw (default) or natural
tags string[] no Post-filter by tags
agent string no Post-filter by agent
since string no Lower bound on updated_at (RFC3339)
until string no Upper bound on updated_at (RFC3339)
limit number no Max results (default 10)
include_neighbors boolean no Fetch sequential neighbors (default false)
neighbor_radius number no Neighbors on each side (default 1)

Returns a JSON array of { "key", "value", "tags", "agent", "updated" } objects.

memory_get

Read a file from the markdown workspace, optionally between from_line and to_line (1-indexed, inclusive). Paths are resolved relative to the workspace root and refused if they escape it.

Parameter Type Required Description
path string yes Workspace-relative path (e.g. memory/2026-04-11.md)
from_line number no First line to read
to_line number no Last line to read

memory_list

List memory keys with metadata (without values).

Parameter Type Required Description
tags string[] no Filter by tags
agent string no Filter by agent

memory_forget

Delete memories by key and/or tags. A tombstone section is appended to the markdown workspace, and the SQLite index soft-deletes the matching rows.

Parameter Type Required Description
key string no Exact key to delete
tags string[] no Delete all entries matching these tags (AND logic)
agent string no Recorded on the tombstone

HTTP endpoints

mem7 serve exposes these routes :

Method Path Description
GET /healthz Liveness probe (always public, no auth)
POST /rpc JSON-RPC 2.0 endpoint — same MCP tool surface as stdio
POST /memory/snapshot_reminder Returns a structured instructional payload for an agent runtime to inject into its context before compaction

Bearer auth is applied to /rpc and /memory/* when MEM7_TOKEN (or --token) is set.

Example :

curl -s -X POST http://localhost:9070/rpc \
  -H "Authorization: Bearer $MEM7_TOKEN" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/call",
       "params":{"name":"memory_search","arguments":{"query":"roadmap*"}}}'

Architecture

      Claude Code / flux7-mesh / Python SDK / scripts
                    │
          MCP stdio ┴ HTTP JSON-RPC
                    │
              ┌─────▼─────┐
              │ Dispatcher │   ← MCP protocol layer
              └─────┬─────┘
                    │
              ┌─────▼─────┐
              │   Store    │   ← orchestrator
              └──┬──┬──┬──┬┘
                 │  │  │  │
          ┌──────▼┐ │ ┌▼──────────┐ ┌▼─────────┐
          │markdown│ │ │ sqlite    │ │ reranker  │
          │workspace│ │ │ (facts +  │ │ (Ollama)  │
          │(truth) │ │ │ FTS5 +    │ │ opt-in    │
          └────────┘ │ │ embeds)   │ └───────────┘
                     │ └───────────┘
              ┌──────▼──────┐
              │  embedder   │  ← opt-in, external
              │ (Ollama /   │
              │  OpenAI)    │
              └─────────────┘

Every write goes through the markdown writer first and then updates the SQLite index. If hybrid search is enabled, an embedding is computed via the external provider and stored as a BLOB. Reads consult the index only ; embeddings are cached in memory for sub-ms cosine search. If the index is corrupted or out of sync, mem7 rescan drops it and replays the markdown chronologically to reconstruct a consistent state.

License

Apache 2.0

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured