context-mem

context-mem

Your AI forgets everything between sessions. This fixes that — 98%+ retrieval accuracy, With llm 100% on LongMemEval, 99% token savings. 44 MCP tools. Fully local, zero cost.

Category
Visit Server

README

<p align="center"> <img src="https://raw.githubusercontent.com/JubaKitiashvili/context-mem/main/docs/banner.svg" alt="Context Mem — persistent memory for AI agents" width="100%"/> </p>

<div align="center">

Context Mem

Your AI coding assistant forgets everything between sessions. This fixes that.

npm version LongMemEval tests tools license

</div>


The Problem

Every time you start a new AI session, your assistant has zero memory of what you built yesterday. The architecture decisions, the bugs you fixed, the preferences you stated — all gone. You spend the first 10 minutes re-explaining context.

The Fix

context-mem runs in the background, captures everything automatically, and retrieves exactly the right context when you need it:

  • Longer sessions without losing context (99% token savings)
  • Instant continuity — new sessions pick up where you left off
  • Automatic — no manual saving, no commands to remember
  • Fully local — your code never leaves your machine
  • Free — no API keys, no subscription, no cloud
npm i context-mem && npx context-mem init

One command. Works with Claude Code, Cursor, Windsurf, VS Code, Cline, and Roo Code.


Retrieval Benchmarks

Tested on 4 academic benchmarks. All scores are session-level retrieval recall (did the correct session appear in top-k?), not end-to-end QA accuracy.

Pure Local (zero API calls, fully free)

Benchmark Retrieval Recall Questions Sessions/conv Metric
LongMemEval 97.8% R@5 500 ~53 Session R@5
LoCoMo 98.1% R@10 1,977 19-35 Session R@10
MemBench 98.0% R@5 500 Hybrid top-5
ConvoMem 97.7% R@10 250 Session R@10

With Optional LLM Reranking (Haiku, ~$1 per 500 queries)

Benchmark Retrieval Recall
LongMemEval 100.0% R@5 (500/500)

vs MemPalace (same methodology — session-level retrieval recall)

Benchmark Context Mem MemPalace
LongMemEval R@5 97.8% 96.6%
LoCoMo R@10 98.1% 60.3%

Both systems achieve 100% on LME with optional Haiku reranking. MemPalace comparison uses identical methodology (session-level, same datasets).

<details> <summary>Benchmark methodology notes</summary>

  • Metric: Session-level retrieval recall — a hit is scored if any correct evidence session appears in the top-k results. This is different from end-to-end QA accuracy (retrieve + generate answer + judge), which would be lower for any system.
  • Granularity: Sessions (all dialog turns joined per session). LoCoMo has 19-35 sessions per conversation, so R@10 selects roughly a third of the candidate pool.
  • Ingestion: LoCoMo benchmark appends dataset-provided metadata (session_summary, observation, event_summary) to session documents. The production system does similar enrichment via summarizers and entity extraction.
  • Synonym expansions: Core query-builder includes general synonyms (movie→film, sibling→brother). Benchmark adapter adds ~50 additional domain-specific expansions derived from failure analysis. Core-only results are ~1-2% lower.
  • Benchmark code: Fully open in benchmarks/ — run them yourself with npm run bench.

</details>


How It Works

<img src="https://raw.githubusercontent.com/JubaKitiashvili/context-mem/main/docs/architecture.svg" alt="Observation Pipeline" width="100%"/>

Every tool output flows through the pipeline: privacy screening (9 secret detectors) → parallel extraction (entities, importance, topics) → 14 content summarizerstriple storage (verbatim archive, SQLite summaries, knowledge graph) → adaptive compression over time.

Full coding session (50 tool outputs): 365 KB → 3.2 KB (99% savings).


What it is (and isn't)

context-mem is:

  • A retrieval-first memory system (not a chatbot wrapper)
  • A context compression engine (14 content-aware summarizers)
  • Infrastructure for AI agents (44 MCP tools)

context-mem is not:

  • Chat history storage (it extracts meaning, not raw logs)
  • An LLM wrapper (works without any API keys)
  • A cloud service (fully local SQLite)

Quick Start

npm i context-mem && npx context-mem init

init auto-detects your editor:

Editor What gets created
Claude Code .mcp.json + hooks (8 hooks incl. context-triggered injection) + CLAUDE.md
Cursor .cursor/mcp.json + .cursor/rules/context-mem.mdc
Windsurf .windsurf/mcp.json + .windsurf/rules/context-mem.md
VS Code / Copilot .vscode/mcp.json + .github/copilot-instructions.md
Cline .cline/mcp_settings.json + .clinerules/context-mem.md
Roo Code .roo-code/mcp_settings.json + .roo/rules/context-mem.md

Real-World Examples

You: "Why did we choose Postgres?"
  → recall returns the exact verbatim quote from March 15, importance 0.95,
    with the full evidence chain: error → file_read → search → decision

You: "What did Sarah work on last sprint?"
  → browse by person shows 14 observations mentioning Sarah,
    grouped by topic (auth, database, deployment)

You: "Generate a PR description"
  → context-mem story --format pr assembles changes, decisions, resolved
    issues, and test plan from the current session

You: "What are we about to forget?"
  → predict_loss shows 8 entries at risk: low importance, 45+ days old,
    never accessed. Pin the critical ones before they decay.

Search Architecture

<img src="https://raw.githubusercontent.com/JubaKitiashvili/context-mem/main/docs/search-architecture.svg" alt="Hybrid Parallel Search" width="100%"/>

BM25 (8 strategies + synonym expansion) and vector search run independently in parallel, then fuse via intent-adaptive weights with IDF-weighted content reranking. Optional LLM judge reranker pushes accuracy to 100%. Fully local by default.


Core Features

Capability Description
Importance Scoring Every observation scored 0.0–1.0 with 6 significance flags: DECISION, ORIGIN, PIVOT, CORE, MILESTONE, PROBLEM. Auto-pin for decisions and milestones.
Verbatim Recall Surface original content (not summaries) via recall tool. Dedicated FTS5 index. Importance, type, time, and flag filters.
Adaptive Compression 4-tier progressive: verbatim (0-7d) → light (7-30d) → medium (30-90d) → distilled (90d+). Pinned entries stay verbatim forever.
Entity Intelligence Auto-detect technologies, people, file paths, CamelCase, ALL_CAPS. 100+ aliases (React.js → React). Knowledge graph storage.
Temporal Facts valid_from/valid_to on knowledge. Supersession chains. temporal_query: "what was true about X at time T?"
Wake-Up Primer Token-budgeted context at session start. 4 layers: profile (15%), critical knowledge (40%), decisions (30%), entities (15%).
Decision Trails Evidence chain reconstruction. explain_decision walks events backward: file reads → errors → searches → decision.
Session Narratives 4 templates: PR description, standup update, ADR, onboarding guide. CLI: context-mem story --format pr.
Hybrid Search BM25 (8 strategies + synonym expansion) + vector (nomic-embed 768-dim) parallel fusion. Optional LLM judge reranker. Sub-millisecond.
Temporal Resolver Deterministic date parsing for relative time queries ("3 days ago", "last Saturday"). Zero LLM cost.
Per-Prompt Injection UserPromptSubmit hook auto-injects relevant memories on every user message. Rate-limited, topic-deduplicated.
Knowledge Graph Entity-relationship model: files, modules, patterns, decisions, bugs, people, libraries, services, APIs, configs.
Multi-Agent Register, claim files, check status, broadcast. Shared memory prevents duplicate work and merge conflicts.
Privacy Engine Fully local. <private> tag stripping, custom regex, 9 secret detectors. No telemetry, no cloud.

Intelligence Dashboard

Real-time web UI with 6 pages — context-mem dashboard to launch:

<img src="https://raw.githubusercontent.com/JubaKitiashvili/context-mem/main/docs/screenshots/dashboard-hero.png" alt="Dashboard — Intelligence Overview" width="100%"/>

<details> <summary>More dashboard pages</summary>

Knowledge Graph — force-directed entity visualization with type filtering and depth control:

<img src="https://raw.githubusercontent.com/JubaKitiashvili/context-mem/main/docs/screenshots/dashboard-graph-page.png" alt="Dashboard — Knowledge Graph" width="100%"/>

Topics — topic cloud with observation counts and cross-project tunnels:

<img src="https://raw.githubusercontent.com/JubaKitiashvili/context-mem/main/docs/screenshots/dashboard-topics.png" alt="Dashboard — Topics" width="100%"/>

Timeline — chronological observations with importance badges, flags, and verbatim mode:

<img src="https://raw.githubusercontent.com/JubaKitiashvili/context-mem/main/docs/screenshots/dashboard-timeline.png" alt="Dashboard — Timeline" width="100%"/>

</details>


How It Compares

Context Mem v3.2 MemPalace claude-mem
Retrieval Recall 98%+ session recall (4 benchmarks) 96.6% LME, 60.3% LoCoMo Not benchmarked
Token Savings 99% (benchmarked) 0% (stores everything) ~95% (claimed)
Search BM25 (8 strategies) + Vector + LLM Judge ChromaDB Basic recall
Entity Intelligence Auto-detect + 100 aliases + graph No No
Importance Scoring 0.0-1.0 with 6 significance flags No No
Decision Trails Evidence chain reconstruction No No
Session Narratives PR/Standup/ADR/Onboarding No No
Cross-Project Memory Global store + topic tunnels No No
LLM Dependency Optional (free by default) 100% LME requires paid API Required (~$57/mo)
Privacy Fully local, 9 secret detectors Local Local
License MIT Proprietary AGPL-3.0

Performance

All operations are sub-millisecond, zero LLM dependency:

Operation Speed Latency
Importance Classification 556K ops/s 0.002ms
Entity Extraction 179K ops/s 0.006ms
Topic Detection 162K ops/s 0.006ms
Compression Tier Calc 3M ops/s <0.001ms
Verbatim FTS Search 50K ops/s 0.020ms
BM25 Search 3.3K ops/s 0.3ms
Wake-Up Primer Assembly 9K ops/s 0.111ms
Narrative Generation 6K ops/s 0.164ms

MCP Tools (44)

<details> <summary>Click to see all 44 tools</summary>

Tool Description
Core
observe Store observation with auto-summarization + importance scoring
search Hybrid search with optional verbatim mode
get Retrieve full observation by ID
timeline Reverse-chronological list with importance badges
stats Token economics for current session
summarize Summarize content without storing
configure Update runtime configuration
execute Run code (JS, TS, Python, Shell, Ruby, Go, Rust, PHP, Perl, R, Elixir)
Content
index_content Index with code-aware chunking
search_content Search indexed chunks
Knowledge
save_knowledge Save with contradiction detection + temporal validity
search_knowledge Search (filters superseded by default)
promote_knowledge Promote to global cross-project store
global_search Search across all projects
resolve_contradiction Resolve conflicts (supersede/merge/keep/archive)
merge_suggestions View cross-project duplicate suggestions
Graph
graph_query Traverse entity relationships
add_relationship Link entities
graph_neighbors Find connected entities
Session
update_profile Project profile
budget_status / budget_configure Token budget management
restore_session Restore from snapshot
handoff_session Cross-session continuity
Events
emit_event / query_events P1-P4 event tracking
Agents
agent_register / agent_status / claim_files / agent_broadcast Multi-agent coordination
Intelligence
time_travel Compare project state at any point in time
ask Natural language question answering
Total Recall
recall Verbatim memory retrieval with importance/flag/time filters
wake_up Generate scored session primer (4-layer context)
entity_detect Extract entities from text
list_people Person entities with relationship counts
temporal_query Knowledge valid at specific timestamp
browse Navigate by topic, person, or time
list_topics Topic list with observation counts
find_tunnels Cross-project topic bridges
import_conversations Import ChatGPT/Claude/Slack/text conversations
explain_decision Decision trail evidence chain
generate_story Narrative (PR/standup/ADR/onboarding)
predict_loss Memory pressure prediction

</details>


CLI Commands

context-mem init                    # Initialize in current project
context-mem serve                   # Start MCP server (stdio)
context-mem status                  # Show database stats
context-mem doctor                  # Run health checks
context-mem dashboard               # Open web dashboard (6 pages)
context-mem why <query>             # Decision trail — why was X decided?
context-mem story --format pr       # Generate narrative (pr/standup/adr/onboarding)
context-mem import-convos <path>    # Import conversations (auto-detect format)
context-mem export                  # Export as JSON
context-mem import                  # Import from JSON
context-mem plugin add|remove|list  # Manage summarizer plugins

Configuration

<details> <summary>.context-mem.json</summary>

{
  "storage": "auto",
  "plugins": {
    "summarizers": ["shell", "json", "error", "log", "code"],
    "search": ["bm25", "trigram", "vector"],
    "runtimes": ["javascript", "python"]
  },
  "search_weights": { "bm25": 0.45, "trigram": 0.15, "levenshtein": 0.05, "vector": 0.35 },
  "privacy": { "strip_tags": true, "redact_patterns": [] },
  "lifecycle": { "ttl_days": 30, "max_db_size_mb": 500, "max_observations": 50000 },
  "ai_curation": { "enabled": false, "provider": "auto" }
}

</details>


Platform Support

Platform Auto-Setup
Claude Code, Cursor, Windsurf, VS Code/Copilot, Cline, Roo Code context-mem init
Gemini CLI, Antigravity, Goose, OpenClaw, CrewAI, LangChain See configs/

Documentation

Doc Description
Benchmark Results Compression + retrieval benchmarks
Contributing How to contribute

License

MIT — Juba Kitiashvili


<div align="center">

Get Started

npm i context-mem && npx context-mem init

Read the Docs · View Benchmarks · Report a Bug · Contributing


Context Mem v3.2 — 98%+ accuracy on every benchmark. Your AI never forgets.

Star on GitHub npm Follow

</div>

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
E2B

E2B

Using MCP to run code via e2b.

Official
Featured