retavyn
Retavyn provides persistent memory for Claude, automatically storing and injecting memories across sessions to maintain context and recall past information.
README
<p align="center"> <img src="retavyn_avatar.png" width="180" alt="Retavyn mascot" /> </p>
<h1 align="center">retavyn</h1> <p align="center"><strong>Persistent memory for Claude across sessions.</strong></p>
<p align="center"> <img src="https://img.shields.io/badge/python-3.11+-4f8ef7?style=flat-square" alt="Python 3.11+" /> <img src="https://img.shields.io/badge/PostgreSQL-18-316192?style=flat-square&logo=postgresql&logoColor=white" alt="PostgreSQL 18" /> <img src="https://img.shields.io/badge/pgvector-semantic_search-7c6af7?style=flat-square" alt="pgvector" /> <img src="https://img.shields.io/badge/transport-stdio_%7C_HTTP%2FSSE-22c55e?style=flat-square" alt="stdio | HTTP/SSE" /> <img src="https://img.shields.io/badge/license-MIT-888890?style=flat-square" alt="MIT License" /> </p>
Every Claude session starts cold — no memory of what you worked on yesterday, what decisions you made, what you learned. Retavyn fixes that. It stores what matters and injects it back into Claude's context automatically at the start of every session.
You talk to Claude normally. It remembers.
Features
- Automatic context injection — a
SessionStarthook dumps all memories to a local cache and injects them into context before the first message - Hybrid search — full-text (
tsvector/tsquery) and semantic similarity (pgvector) combined for recall that works on exact words or general concepts - Two transport modes — stdio for Claude Code, HTTP/SSE for claude.ai remote access
- Category tagging — store memories with categories (
ci-cd,journal,project, etc.) for filtered recall - Bulk ingestion —
ingest_pathwalks a file or directory tree and stores each file as a memory, with automatic embedding backfill - Live cache refresh — a
PostToolUsehook refreshes the local cache immediately after everyremembercall - OAuth-secured remote access — custom OAuth 2.0 + JWT flow required by the MCP spec for HTTP transport, served behind a Cloudflare Tunnel
How it works
Retavyn runs as an MCP server alongside Claude. When a session starts, a hook fires automatically — it dumps all stored memories to a local cache file and injects them into Claude's context before the first message. A second hook refreshes that cache after every remember call, so new memories are available in the next session immediately.
Search is hybrid: full-text (tsvector/tsquery) for exact matches and semantic similarity (pgvector cosine distance) for concept-level recall. Results from both passes are merged and ranked.
The server supports two transports. In stdio mode, Claude Code spawns it as a local subprocess — zero network exposure. In HTTP/SSE mode, it runs on a server behind a Cloudflare Tunnel with OAuth 2.0 + JWT auth, and claude.ai connects to it as a remote MCP server. That same HTTP endpoint is also what lets multiple machines share one memory pool — every Claude Code install can point its MCP config at the remote database, so your memories follow you across machines.
Architecture
Claude Code (local, stdio)
┌──────────────────────────────────────────────────────────┐
│ Claude Code │
│ SessionStart hook ──► inject retavyn-cache.md │
│ PostToolUse hook ──► refresh cache after remember │
└───────────────────────────┬──────────────────────────────┘
│ stdio (MCP protocol)
┌────────▼────────┐
│ retavyn │ Python + FastMCP
│ MCP server │
└────────┬────────┘
│
┌────────▼────────┐
│ PostgreSQL 18 │ Docker · port 5433
│ + pgvector │ tsvector + pgvector
└─────────────────┘
claude.ai (remote, HTTP/SSE)
claude.ai → https://mcp.retavyn.com → Cloudflare edge (TLS)
→ cloudflared tunnel → retavyn :8765 → PostgreSQL :5433
OAuth flow: claude.ai opens /authorize, user authenticates, server issues a JWT, claude.ai uses it as a Bearer token on all subsequent MCP calls.
Search internals
When you call recall("billing pipeline"), retavyn runs two passes and merges the results:
- Full-text search —
tsvector @@ to_tsquery('billing & pipeline'), ranked byts_rank - Semantic search — cosine distance between the query embedding and stored embeddings via pgvector (
embedding <=> $1 < threshold) - Results are deduplicated and returned ranked by combined score
Embeddings are generated via OpenAI text-embedding-3-small or Cohere embed-english-v3.0 (configurable). Memories without embeddings fall back to full-text only.
MCP tools
| Tool | Description |
|---|---|
remember |
Store a memory with optional category tag |
recall |
Hybrid full-text + semantic search across memories |
update_memory |
Edit an existing memory by ID |
forget |
Delete a memory by ID |
forget_path |
Delete all memories ingested from a file or directory path |
ingest_path |
Bulk-import a file or directory tree as memories |
backfill_embeddings |
Generate embeddings for memories that don't have them |
ask_infra |
Ask a DevOps question — runs a full agent loop (memory search + live gcloud) and returns a synthesized answer |
ask_infra
ask_infra is an agent embedded inside retavyn. When called, it spins up its own Claude tool-use loop with two tools — recall_memory (hybrid search over your retavyn memories) and run_gcloud (read-only live GCP queries) — iterates until it has a complete answer, then returns it as a single response.
From Claude Code's perspective it's one tool call. Under the hood it's a full agent making multiple passes across memory and live infrastructure state before synthesizing an answer.
Example questions:
"What load balancer setup do we use for Cloud Run services?"
"Which GKE clusters are running in prod right now?"
"How do we handle Cloud SQL private service connect?"
The agent is also available as a standalone CLI — see infra-agent/README.md.
Setup
| Guide | What it covers |
|---|---|
| INSTALL.md | Local setup — run retavyn on your machine with Claude Code |
| SERVER.md | Remote server — deploy to a VM for claude.ai and cross-machine access |
Environment variables
| Variable | Default | Description |
|---|---|---|
MEMORY_DB_HOST |
localhost |
PostgreSQL host |
MEMORY_DB_PORT |
5433 |
PostgreSQL port |
MEMORY_DB_NAME |
retavyn |
Database name |
MEMORY_DB_USER |
claude |
Database user |
MEMORY_DB_PASSWORD |
claude |
Database password |
MEMORY_TRANSPORT |
stdio |
stdio or streamable-http |
MEMORY_HOST |
0.0.0.0 |
Bind address (HTTP mode) |
MEMORY_PORT |
8765 |
Port (HTTP mode) |
OAUTH_SECRET |
— | JWT signing secret (HTTP mode) |
OAUTH_PASSWORD |
— | Auth password for browser flow (HTTP mode) |
OPENAI_API_KEY |
— | For OpenAI embeddings (optional) |
COHERE_API_KEY |
— | For Cohere embeddings (optional) |
Documentation
| File | Contents |
|---|---|
| INSTALL.md | Local install: setup.sh, MCP config, hooks |
| SERVER.md | Remote deploy: GCE VM, Cloudflare Tunnel, OAuth, claude.ai |
| TUTORIAL.md | First memory → first recall → journaling |
| API.md | Complete tool reference, search internals, advanced usage |
CLI commands
python main.py # start MCP server (stdio)
python main.py dump # export all memories to ~/.claude/retavyn-cache.md
python main.py remember <content> [category] # store a memory from the CLI
python main.py health # check DB connection and memory count
python main.py ingest <path> [category] # bulk ingest a file or directory
© 2026 Matt Bucknam — MIT License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.