thrift-memory
Cost-first memory layer for MCP-capable agents that stores memories cheaply and recalls relevant slices under a hard token budget, logging receipts for every recall.
README
Thrift Memory
Cost-first memory for AI agent teams. (npm: thrift-memory)
Not affiliated with Apache Thrift, the RPC framework. This is an MCP memory layer for AI agents.
Thrift Memory gives MCP-capable agents a small shared memory layer that optimizes for cost visibility: store memories cheaply, recall only the relevant slice under a hard token budget, and log a receipt for every recall.
savedTokens = baselineTokens - injectedTokens
The goal is practical: help teams of agents stop paying to reload the same broad context on every run.
Status: early
0.0.x. APIs are useful but still allowed to change beforev0.1.
What It Does
Thrift has three surfaces:
| Surface | Purpose |
|---|---|
| MCP server | Agent memory tools: remember, recall, search_memory |
| Local dashboard | Savings UI backed by the meter JSONL, plus owner controls (pin/disable, budgets, kill-switch) |
| Proxy | Optional HTTP gateway that trims live LLM requests and retries rate limits |
Be precise about the split:
- MCP manages memory recall and token receipts.
thrift-proxymanages live request trimming and rate-limit retries.
How It Compares
Mature memory layers — Mem0, Zep, Letta, Cognee — optimize recall quality: LLM-enriched writes, temporal or entity knowledge graphs, deep personalization. They are excellent at that, and far more battle-tested than this project. Thrift Memory does not try to beat them on recall depth.
Thrift optimizes a different axis: cost, locally, with proof. The tradeoffs:
| Thrift Memory | Quality-first layers (Mem0 / Zep / Letta / Cognee) | |
|---|---|---|
| Primary goal | Cut & prove token cost (budget + savings receipt) | Maximize recall quality / reasoning |
| Write path | Cheap — no mandatory LLM enrichment | Often LLM extraction/embedding on write |
| Install | npx thrift-memory — one dependency, local JSONL, no API key, no DB, no Docker |
Typically an LLM key + a vector/graph DB (e.g. Mem0 self-host: API + Postgres/pgvector + Neo4j) |
| Dashboard | Token-savings meter + owner controls, local & read/write | Memory/agent-management UIs (several have one; different purpose) |
| Recall depth | Scoped match under a hard token budget | Knowledge-graph / temporal / semantic ranking |
| Maturity | Early 0.0.x |
Production-grade, widely adopted |
Honest summary: if you need the smartest possible recall, use one of the others. If you run a fleet of agents that keep re-paying to reload broad context and you want to measure and cap that cost with no extra infrastructure, that gap is what Thrift fills. The two are not mutually exclusive — Thrift can sit in front of a heavier store as the budget/metering layer.
MCP Tools
remember(scope, text, agentId?, sessionId?, tags?)
Store a memory in org, agent, or session scope.
recall(agentId, tokenBudget, task?, tags?)
Return relevant memories under a hard token budget.
Also returns { injectedTokens, baselineTokens, savedTokens }.
search_memory(agentId, task?, tags?, limit?)
Browse matching memories without applying a small recall budget.
Quick Start
npm install -g thrift-memory
Add Thrift to an MCP-capable client:
{
"mcpServers": {
"thrift": {
"command": "npx",
"args": ["thrift-memory"]
}
}
}
Or run the MCP server directly:
npx thrift-memory \
--store-path=~/.thrift/memories.jsonl \
--meter-path=~/.thrift/meter.jsonl \
--default-budget=2000
60-Second Demo
No agent required — prove the remember → recall → receipt loop with the library.
Save as demo.mjs after npm install thrift-memory, then node demo.mjs:
import { JsonlStore, ScopedRetriever } from "thrift-memory";
const store = new JsonlStore({ path: "./demo.jsonl" });
const now = Date.now();
// 1. remember — store a few org memories (cheap, no LLM enrichment)
store.add({ scope: "org", text: "All money values are stored as integer cents, never floats." }, now);
store.add({ scope: "org", text: "We deploy only on green CI; no Friday-evening releases." }, now);
store.add({ scope: "org", text: "Postgres is the system of record; Redis is cache-only." }, now);
// 2. recall — load only what the task needs, under a hard token budget
const r = new ScopedRetriever().recall(store, {
agentId: "dev",
task: "how should I store money values?",
tokenBudget: 40,
});
// 3. receipt
for (const m of r.memories) console.log("•", m.text);
console.log(`injected ${r.injectedTokens} / baseline ${r.baselineTokens} (saved ${r.savedTokens})`);
• All money values are stored as integer cents, never floats.
injected 15 / baseline 43 (saved 28)
Only the relevant memory is injected — the deploy-cadence and Postgres notes are
dropped because they don't match the task, not merely because of the budget
(recall applies a relevance floor). That gap, baseline - injected, is exactly
what you stop paying for on every run. Relevance here is lexical overlap, so phrase
the task with words your memories actually use; an empty result means nothing
in scope was relevant — which is the honest answer, not noise to pad the budget.
Dashboard
The optional dashboard is local. It shows whether Thrift is really saving tokens
across real agent runs, and (as of 0.0.3) exposes a small write surface for owner
controls — pin/disable a memory, set per-agent budgets, mute an agent, and a
fleet-wide kill-switch — over local POST/DELETE endpoints. The same controls
are available from the thrift-panel CLI.
npx thrift-panel serve \
--store-path=~/.thrift/memories.jsonl \
--meter-path=~/.thrift/meter.jsonl \
--control-path=~/.thrift/control.json \
--port=8585
Open http://127.0.0.1:8585.
The dashboard shows:
| View | What it proves |
|---|---|
| Fleet summary | Total baseline, injected, saved tokens, and savings rate |
| Daily token flow | Whether savings persist across real days |
| Agent savings | Which agents are expensive and which save the most |
| Recent receipts | The latest metered recall/proxy events |
| Audit paths | The local files backing the numbers |
CLI equivalents:
npx thrift-panel summary --store-path=~/.thrift/memories.jsonl --meter-path=~/.thrift/meter.jsonl
npx thrift-panel agents --store-path=~/.thrift/memories.jsonl --meter-path=~/.thrift/meter.jsonl
npx thrift-panel memories --store-path=~/.thrift/memories.jsonl --scope=org
Measuring Performance
Every recall writes a receipt to THRIFT_METER_PATH when a meter path is
configured:
{"at":1760000000000,"agentId":"dev","injectedTokens":420,"baselineTokens":2100,"savedTokens":1680}
Definitions:
| Field | Meaning |
|---|---|
baselineTokens |
The no-Thrift counterfactual: all in-scope memory that would have been loaded |
injectedTokens |
The slice Thrift actually returned under budget |
savedTokens |
baselineTokens - injectedTokens |
| Savings rate | savedTokens / baselineTokens |
Recommended measurement loop:
- Seed memories from your own markdown files or use
remember. - Let real agents call
recallduring normal work. - Review
thrift-panel summaryandthrift-panel agents. - Validate quality separately by comparing task outcomes with full memory vs Thrift recall.
For a credible public report, publish both token reduction and quality evidence. For example: "saved 72% of memory tokens across 200 real recalls, with 19/20 paired tasks producing the same outcome."
Account for the MCP overhead. Registering any MCP server adds its tool-schema load to each agent's context (often several thousand tokens). The honest figure is net:
savings = recall reduction − MCP schema/tool-call overhead. On a context-heavy agent that reloads broad memory every run, recall usually wins by a wide margin — but confirm it with the meter on your own workload before going fleet-wide, rather than assuming. The receipts exist precisely so you don't have to guess.
Synthetic Benchmark
This repo includes a small synthetic fixture so users can verify the measurement pipeline without any private data:
npm run build
node benchmark/run.mjs
It reads:
benchmark/fixtures/memories.jsonlbenchmark/fixtures/meter.jsonl
See docs/case-study.md for a sanitized example of how to interpret the numbers.
Proxy And Rate Limits
The proxy is optional. Use it when an agent can point its LLM base_url at a
local HTTP gateway.
Security — run it locally only. The proxy forwards your real provider API key upstream unchanged. It binds to
127.0.0.1by default (enforced in code, not just docs), so it is not reachable off-host unless you deliberately opt in with--host=0.0.0.0/THRIFT_PROXY_HOST. Never expose it on a public interface or share the port. It is a single-tenant developer tool, not a hardened multi-tenant gateway. Responses are also buffered, so SSE streaming is not passed through yet.
npx thrift-proxy \
--upstream=https://api.anthropic.com \
--host=127.0.0.1 \
--port=8787 \
--budget=4000 \
--meter-path=~/.thrift/meter.jsonl
Then configure the agent's LLM base URL as http://localhost:8787 and keep using
the real provider API key.
The proxy:
- trims live request context under a hard token budget,
- writes the same savings receipts as the MCP surface,
- retries upstream
429and503 Retry-Afterresponses, - throttles concurrent upstream requests per provider.
Rate-limit defaults:
| Setting | Default | Env var |
|---|---|---|
| Max concurrency | 5 |
THRIFT_MAX_CONCURRENCY |
| Max retries | 5 |
THRIFT_MAX_RETRIES |
| Backoff base | 1000ms |
THRIFT_BACKOFF_BASE_MS |
| Max backoff | 60000ms |
THRIFT_MAX_BACKOFF_MS |
thrift-proxy buffers responses in this version; streaming passthrough is a
future improvement.
Import Existing Memories
The import script is generic and local-only. It can import markdown files into a JSONL store:
node scripts/import-memories.mjs \
--source=./memory \
--scope=org \
--store-path=~/.thrift/memories.jsonl \
--dry-run
For agent-scoped memories, put markdown files under project directories and use
--scope=agent:
memory/
checkout-service/
dev.md
qa.md
docs-site/
writer.md
node scripts/import-memories.mjs --source=./memory --scope=agent
Library Usage
import { JsonlStore, ScopedRetriever, InMemoryMeter, ThriftMcpServer } from "thrift-memory";
const server = new ThriftMcpServer({
store: new JsonlStore({ path: "./memories.jsonl" }),
retriever: new ScopedRetriever(),
meter: new InMemoryMeter(),
defaultTokenBudget: 2000,
});
await server.runStdio();
Development
npm install
npm run typecheck
npm run build
npm test
Layout
| Path | Purpose |
|---|---|
src/mcp/ |
MCP stdio server and tool definitions |
src/store/ |
JSONL memory store |
src/retrieval/ |
Scoped budget-bounded recall |
src/meter/ |
Token meter and rollups |
src/control/ |
CLI and local dashboard |
src/proxy/ |
HTTP proxy, context trimming, rate-limit retries |
benchmark/fixtures/ |
Synthetic public benchmark data |
docs/ |
Public docs, screenshot, sanitized case study |
test/ |
Unit and integration tests |
License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.