Memora

Memora

A local, persistent, semantically-aware knowledge graph for AI coding agents like Claude Code, providing efficient session memory with minimal token cost and zero runtime network calls.

Category
Visit Server

README

Memora

Persistent, semantic memory for AI coding agents — local-first, MCP-native.

License: MIT Python Status MCP

A local, persistent, semantically-aware knowledge graph for Claude Code (and any MCP-compatible AI coding agent). Auto-loads in every session in every project. Zero per-project setup, zero network calls at runtime, ~85–95% lower per-session context cost than naïve "load it all" memory.

git clone https://github.com/VnemAIDev/memora.git
cd memora
./install.sh --bootstrap

That's the full install. Open Claude Code in any directory; memory auto-loads. See QUICKSTART.md for prerequisites and troubleshooting.


At a glance

  • Install root: ~/.claude-memory/
  • Database: ~/.claude-memory/graph.db (SQLite, WAL mode)
  • Total disk: ~320 MB (venv 227 MB + ONNX model 90 MB + DB ~2 MB)
  • Registered scope: user-level MCP server (claude mcp listmemory)
  • Global protocol: ~/.claude/CLAUDE.md
  • Runtime network calls: zero (model is downloaded once at install time)

Quick start (for a fresh reader)

# Verify the server is registered and reachable
claude mcp list

# Check semantic coverage
~/.claude-memory/.venv/bin/python -c "
import sys; sys.path.insert(0,'$HOME/.claude-memory')
import embeddings; print(embeddings.status())"

# Inspect the DB directly
sqlite3 ~/.claude-memory/graph.db \
  "SELECT project, COUNT(*) FROM entities GROUP BY project;"

# Periodic maintenance (dedupe observations)
~/.claude-memory/.venv/bin/python ~/.claude-memory/compact.py --aggressive --semantic 0.92

File inventory

File Purpose
server.py FastMCP server, exposes 12 tools over stdio
embeddings.py Lazy-loaded MiniLM-L6-v2 ONNX embedder (fastembed)
bootstrap_embeddings.py One-shot: downloads model + embeds all observations
compact.py Dedupe script (--aggressive, --semantic THRESHOLD)
run.sh Venv-activating launcher (registered with Claude Code)
graph.db SQLite knowledge graph
server.log Server logs
models/ ONNX model cache (populated by bootstrap)

Build timeline — 4 phases

Phase 1 — Base infrastructure

Minimal MCP server matching the original spec.

  • Schema: 4 tables (entities, observations, relations, tags) + 7 indexes, WAL mode, foreign keys
  • Tools (8): recall_context, create_entity, add_observation, create_relation, search, get_entity, list_projects, forget
  • Project auto-detection: CWD basename → project name ($HOME and /"global")
  • Registration: claude mcp add --scope user memory -- ~/.claude-memory/run.sh

Phase 2 — First token-optimization pass (7 wins)

Targeted the biggest pain point: recall_context() returning ~8 KB of mostly-redundant JSON.

Win What changed
Lean default JSON shape Drop IDs/timestamps/indent; relations become [from, type, to] triples
max_chars budget Hard cap with truncated: true flag and omitted count
summary column + set_summary tool One-line gist replaces raw observations on long entities
archived tag auto-exclusion Stale entities excluded by default
since_days filter Only entities updated within last N days
FTS5 virtual table + triggers Real ranked text search instead of LIKE %x%
compact.py script Manual + --aggressive dedupe of redundant observations
New summarize_project tool One-line digest per entity — cheapest possible "what's in here?"

Phase 3 — External research pass (Caveman / RTK / Supermemory)

Researched 3 token-optimization projects in parallel; ported the high-ROI ideas.

Win Source What it does
Type-tier ordering RTK Decisions/conventions/services kept first when budget trims
omitted_names + expand_with RTK Truncated response says exactly what to get_entity() for
Cross-project search via project="*" Supermemory One call hits all your projects
Type-tier grouping in summarize_project Supermemory Stable concepts above ephemeral work
Cross-entity dedup (@dup:<name> sentinel) RTK Repeated obs returned once, referenced thereafter
New flag_for_summary tool Supermemory Lists entities >N obs without a summary — actionable backlog

Phase 4 — Semantic layer (Supermemory's biggest idea)

Hybrid lexical + semantic search, all local.

Win Implementation
embeddings.py module Lazy-loaded MiniLM-L6-v2 via fastembed + ONNX, L2-normalized
observations.embedding BLOB column 384-dim float32 vector per observation (~1.5 KB each)
Hybrid search() FTS5 BM25 + cosine top-K, fused via Reciprocal Rank Fusion (k=60)
Semantic dedup add_observation(dedup_threshold=0.92) skips paraphrases
compact.py --semantic 0.92 Batch semantic dedup across whole DB
New embedding_status tool Diagnostic — model availability + coverage %
bootstrap_embeddings.py One-shot: download 90 MB model + embed all observations
Final coverage 620 / 620 observations embedded

The complete tool surface — 12 tools

recall_context       create_entity         add_observation
create_relation      search                get_entity
list_projects        forget                set_summary
summarize_project    embedding_status      flag_for_summary

Measured token savings (real project data)

Numbers from the actual smoke tests during the build, on the demo project project with 5 entities and 20 observations:

Call type Bytes returned vs. legacy verbose
recall_context(verbose=True) (legacy) 8,393
recall_context() lean default 3,378 −60%
recall_context() after set_summary on largest entity 2,715 −68%
summarize_project() triage 395 −95%
search("rebuild") FTS5 477 −94%
recall_context(max_chars=800) 464 −94% (hard cap honored)

At a glance: characters ÷ 4 ≈ tokens. Old startup recall cost ~2,100 tokens; the new ritual (summarize_project → selective get_entity) costs ~100-400 tokens depending on what's relevant. 5-20× reduction per session start.


Operational benefits — behavioral wins that compound

Before After
Memory file rewritten end-to-end every session via /memory Persistent SQLite — only deltas written, never the whole file
Per-project memory configured manually CWD basename auto-detects project; works in every dir without setup
Memory loaded only when Claude noticed MEMORY.md Auto-loaded via user-scope MCP + CLAUDE.md protocol nudge
Naïve recall returned full payload regardless of project size Type-tier ordering keeps decisions/conventions; budget caps the rest
Searches missed concept-level queries ("auth" missing login_handler) Hybrid lexical + semantic — finds entities by meaning
Repeated observations bloated context Cross-entity dedup + semantic dedup at write-time
No way to know what's in memory without paying full cost summarize_project() (~400 chars) + flag_for_summary() triage cheaply
Cross-project knowledge invisible from another project search(query, project="*") finds it in one call

Continuous wins

  1. No per-project setup cost. Every new project already has full memory. Zero friction.
  2. Cross-session continuity. State that previously lived in fragile MEMORY.md files now lives in a queryable DB.
  3. Type-aware retrieval. Asking "what conventions apply here?" returns conventions first.
  4. Semantic recall. Don't have to remember exact words. "The thing about animation" finds entities tagged #hero even if "animation" isn't in any observation.
  5. Cheap upkeep. compact.py --aggressive --semantic 0.92 weekly keeps the DB lean.
  6. Privacy. All-local. Model cached. No telemetry. No data leaves the machine.

Cost accounting

Cost Amount
Disk usage ~320 MB (.venv 227 MB + model 90 MB + graph.db ~2 MB + log <1 MB)
Per-call CPU <50 ms for lean recall; <100 ms for semantic search on 620 obs
Network at runtime Zero
Lock-in Zero — schema is plain SQLite, inspect anytime with sqlite3 graph.db
Bootstrap dependency One-time ~90 MB download from HuggingFace (qdrant/all-MiniLM-L6-v2-onnx)

Memory Protocol (what Claude is told to do, from ~/.claude/CLAUDE.md)

Session start ritual

  1. summarize_project() — one line per entity, ~95% cheaper than full recall
  2. flag_for_summary() — for any entity with >5 observations and no summary, call set_summary(name, "<gist>") to short-circuit raw observations on future recalls
  3. recall_context() for the active subset, or get_entity(name) for specific entities

What to write

  • decision + rationale (entity_type="decision")
  • file purpose / important state (entity_type="file")
  • pending task / blocker (entity_type="todo")
  • user preference / convention (entity_type="convention")
  • external service / API / dependency (entity_type="service")
  • person / team member (entity_type="person")

Active-voice relations only: uses, depends_on, blocks, replaces, owns, reports_to, calls, extends.

Keep tokens low

  • Default lean shape; pass verbose=True only when needed
  • Read omitted_names from truncated responses; use get_entity() for specifics
  • @dup:<name> in observations means deduplicated, not missing
  • Tag stale entities archived for auto-exclusion
  • Use since_days=N for recent-only context
  • Use search(query, project="*") for cross-project lookups
  • Run compact.py --aggressive --semantic 0.92 periodically

Safety

Never store secrets, API keys, passwords, or PII. Reference them by name only (e.g., "uses Stripe API key stored in 1Password as STRIPE_PROD").


Maintenance commands

# Check semantic search status
~/.claude-memory/.venv/bin/python -c "
import sys; sys.path.insert(0,'$HOME/.claude-memory')
import embeddings; print(embeddings.status())"

# Re-embed every observation (after model swap)
~/.claude-memory/.venv/bin/python ~/.claude-memory/bootstrap_embeddings.py --rebuild

# Dry-run dedupe (no writes)
~/.claude-memory/.venv/bin/python ~/.claude-memory/compact.py --dry-run --aggressive --semantic 0.92

# Real dedupe + VACUUM
~/.claude-memory/.venv/bin/python ~/.claude-memory/compact.py --aggressive --semantic 0.92

# Health check
claude mcp list

# Activity log
grep "memory server starting" ~/.claude-memory/server.log

# Errors
grep -E "ERROR|failed|Traceback" ~/.claude-memory/server.log

# Direct DB stats
sqlite3 ~/.claude-memory/graph.db "
  SELECT project, COUNT(*) AS entities FROM entities GROUP BY project;
  SELECT 'total observations: ' || COUNT(*) FROM observations;
  SELECT 'embedded observations: ' || COUNT(*) FROM observations WHERE embedding IS NOT NULL;
  SELECT 'total relations: ' || COUNT(*) FROM relations;
"

What this means in practice

A typical session before this work would either (a) ignore memory and re-ask questions Claude should know the answer to, or (b) load 2-8 KB of memory tokens at session start whether useful or not, possibly missing concept-level matches when searching.

Now: Claude opens the session, calls summarize_project() for ~100 tokens, scans for what's relevant, calls flag_for_summary() to spot bloated entities, then fetches detail only where it matters. Median session-start memory cost: 300-800 tokens, down from 2,000+ tokens. Over 100 sessions a month that's 120K+ tokens saved on memory loading alone — and that ignores the bigger win of finding context lexical search would have missed entirely.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured