agentrecall
Provides persistent, searchable memory with hybrid keyword and semantic search, storing memories in a single SQLite file without external dependencies.
README
agentrecall
Agent memory in a single SQLite file. No vector database, no server, no cloud, no API key.
pip install agentrecall-db
Installs as
agentrecall-dbon PyPI (the bareagentrecallname was taken). You stillimport agentrecalland run theagentrecallCLI — only the install name differs.
from agentrecall import Memory
with Memory("agent.db") as mem: # one SQLite file, nothing else running
mem.add("The user prefers dark mode", tags=["preference"])
mem.add("User's name is Aziz; lives in Tashkent", metadata={"kind": "fact"})
# The core install gives you fast keyword recall (SQLite FTS5). For meaning-based
# search that matches paraphrases, add the [semantic] extra — see below.
for hit in mem.search("dark mode preference", k=3):
print(hit.score, hit.content)
That's the whole setup. agent.db is an ordinary SQLite file you can cp, git diff,
back up, inspect with any SQLite tool, and read from any language. Nothing else is running.
Why another memory library?
Most "memory layers" for agents are infrastructure. To get started you stand up a vector database, run a server, sign up for a cloud, or hand over an API key — and many of them call an LLM on every write to "extract" facts, which is slow, costs tokens, and is non-deterministic.
agentrecall is the opposite. It is a library, the store is one file, recall is
deterministic, and nothing leaves the machine.
| infra needed | semantic search | offline | stores | LLM call per write | |
|---|---|---|---|---|---|
| agentrecall | none (1 file) | ✅ torch-free, opt-in | ✅ | SQLite | ❌ verbatim |
| mem0 | vector DB / cloud | ✅ | ⚠️ | vector + KV + graph | ✅ |
| Letta / MemGPT | server + Postgres | ✅ | ⚠️ | Postgres + pgvector | ✅ |
| Zep | server + datastore | ✅ | ⚠️ | knowledge graph | ✅ |
| official MCP memory server | none | ❌ keyword only | ✅ | JSONL flat file | ❌ |
Three things agentrecall does that nothing else combines:
- Zero infrastructure. The core has no third-party dependencies — keyword recall
runs on Python's stdlib
sqlite3(FTS5 + BM25). A freshpip install agentrecall-dbwith nothing else works. - Semantic search with no torch, no GPU, no download server. Add the
[semantic]extra and you get hybrid keyword + vector recall powered by model2vec static embeddings (~10 MB, CPU-only) stored in sqlite-vec. Still one file, still offline. - Verbatim & deterministic.
agentrecallnever calls an LLM to mutate your memories. What youadd()is what is stored — no silent fact-extraction, no cloud round-trip, no surprise token bills.
Install
pip install agentrecall-db # core: keyword recall, stdlib only
pip install "agentrecall-db[semantic]" # + torch-free semantic search (model2vec + sqlite-vec)
pip install "agentrecall-db[mcp]" # + MCP server
pip install "agentrecall-db[all]" # everything
Semantic search (optional, torch-free)
from agentrecall import Memory
# embeddings="auto" (the default) turns semantic on automatically *iff* the
# [semantic] extra is installed, and silently stays keyword-only otherwise.
mem = Memory("agent.db", embeddings="auto")
print(mem.semantic_enabled) # True once you've installed agentrecall[semantic]
mem.add("I love hiking in the mountains on weekends")
hits = mem.search("outdoor hobbies") # matches even with zero shared keywords
Search is hybrid: keyword (FTS5/BM25) and vector (cosine) candidates are blended with
Reciprocal Rank Fusion,
so you get the precision of keywords and the recall of embeddings. Bring your own embedder
(OpenAI, a local model, anything) by passing embedder= — any object with .dim and
.embed(texts) -> list[list[float]].
Optional ranking boosts:
mem = Memory("agent.db", recency_weight=0.5, importance_weight=0.3)
mem.add("Critical: API key rotates on the 1st", importance=3.0)
mem.search("api key", recency_weight=1.0) # per-call override
Namespaces
Isolate memories per user, per agent, or per session with a namespace:
alice = Memory("app.db", namespace="user:alice")
bob = Memory("app.db", namespace="user:bob") # same file, isolated memories
alice.add("prefers metric units")
bob.search("units") # never sees Alice's memories
As an MCP server
Give Claude (or any MCP client) persistent, searchable memory — an embeddings-capable alternative to the official keyword-only JSONL memory server:
pip install "agentrecall-db[mcp]"
agentrecall serve --db ~/.agent-memory.db
// Claude Desktop / Claude Code MCP config
{
"mcpServers": {
"memory": {
"command": "agentrecall",
"args": ["serve", "--db", "/Users/me/.agent-memory.db"]
}
}
}
Tools exposed: remember, recall, forget, list_memories, memory_stats.
CLI
agentrecall add "Deadline is July 7" --tags project --importance 2
agentrecall search "when is the deadline" -k 3
agentrecall list --limit 10
agentrecall stats
agentrecall forget --keep-last 1000 # prune to the newest 1000 per namespace
agentrecall export --format md > memories.md
Every command honours --db, --namespace, and the AGENTRECALL_DB /
AGENTRECALL_NAMESPACE / AGENTRECALL_EMBEDDINGS environment variables.
API at a glance
mem.add(content, *, tags=None, metadata=None, importance=1.0, namespace=None) -> MemoryRecord
mem.add_many([str | dict, ...]) -> list[MemoryRecord]
mem.search(query, *, k=5, namespace=None, tags=None,
recency_weight=None, importance_weight=None) -> list[MemoryHit]
mem.get(id) / mem.update(id, ...) / mem.delete(id)
mem.all(*, namespace=None, tags=None, limit=None, offset=0) -> list[MemoryRecord]
mem.count(*, namespace=None) -> int
mem.forget(*, before=None, namespace=None, keep_last=None) -> int # deleted count
tags filtering matches memories containing all of the requested tags.
The database is just SQLite
No magic. Open it with anything:
sqlite3 agent.db "SELECT id, content, importance, created_at FROM memories ORDER BY created_at DESC LIMIT 5;"
Schema: a memories table (with JSON tags/metadata columns), an FTS5 index kept in
sync by triggers, and — only in semantic mode — a sqlite-vec vector table. See
SPEC.md for the full contract.
Scope & limits (honest defaults)
- Agent-scale, not web-scale. sqlite-vec uses a linear scan (no ANN index yet) — great for thousands of memories per namespace, not millions of RAG chunks.
- Sync, single-process. Use one
Memoryper thread.sqlite3is fast and local; there is no async API by design. - No knowledge graph / entity-relation modeling. That's Cognee
and Zep's lane.
agentrecallstays small on purpose. - No automatic summarization. Memories are stored verbatim. If you want LLM-distilled
memories, distill before you
add()— your call, your model, your tokens. add()is append-only. Re-adding the same text creates a new row (no content dedupe). To revise a memory, keep the integer id returned byadd()and callupdate(id, ...)/delete(id).forget(before=..., keep_last=...)deletes the union — rows older thanbeforeor beyond the newestkeep_lastper namespace. With neither argument it's a no-op (to wipe a store, just delete the file).
Development
pip install -e ".[dev]"
pytest
ruff check .
The keyword (FTS-only) test suite runs with zero third-party dependencies. Semantic
tests are skipped automatically when the [semantic] extra isn't installed.
License
MIT © 2026 Shaxzodbek Qambaraliyev / Blaze
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.