MCP Servers

retavyn

Retavyn provides persistent memory for Claude, automatically storing and injecting memories across sessions to maintain context and recall past information.

README

<h1 align="center">retavyn</h1> <p align="center"><strong>Persistent memory for Claude across sessions.</strong></p>

Every Claude session starts cold — no memory of what you worked on yesterday, what decisions you made, what you learned. Retavyn fixes that. It stores what matters and injects it back into Claude's context automatically at the start of every session.

You talk to Claude normally. It remembers.

Features

Automatic context injection — a SessionStart hook dumps all memories to a local cache and injects them into context before the first message
Hybrid search — full-text (tsvector/tsquery) and semantic similarity (pgvector) combined for recall that works on exact words or general concepts
Two transport modes — stdio for Claude Code, HTTP/SSE for claude.ai remote access
Category tagging — store memories with categories (ci-cd, journal, project, etc.) for filtered recall
Bulk ingestion — ingest_path walks a file or directory tree and stores each file as a memory, with automatic embedding backfill
Live cache refresh — a PostToolUse hook refreshes the local cache immediately after every remember call
OAuth-secured remote access — custom OAuth 2.0 + JWT flow required by the MCP spec for HTTP transport, served behind a Cloudflare Tunnel

How it works

Retavyn runs as an MCP server alongside Claude. When a session starts, a hook fires automatically — it dumps all stored memories to a local cache file and injects them into Claude's context before the first message. A second hook refreshes that cache after every remember call, so new memories are available in the next session immediately.

Search is hybrid: full-text (tsvector/tsquery) for exact matches and semantic similarity (pgvector cosine distance) for concept-level recall. Results from both passes are merged and ranked.

The server supports two transports. In stdio mode, Claude Code spawns it as a local subprocess — zero network exposure. In HTTP/SSE mode, it runs on a server behind a Cloudflare Tunnel with OAuth 2.0 + JWT auth, and claude.ai connects to it as a remote MCP server. That same HTTP endpoint is also what lets multiple machines share one memory pool — every Claude Code install can point its MCP config at the remote database, so your memories follow you across machines.

Architecture

Claude Code (local, stdio)

┌──────────────────────────────────────────────────────────┐
│                      Claude Code                          │
│   SessionStart hook ──► inject retavyn-cache.md          │
│   PostToolUse hook  ──► refresh cache after remember      │
└───────────────────────────┬──────────────────────────────┘
                            │ stdio  (MCP protocol)
                   ┌────────▼────────┐
                   │    retavyn      │  Python + FastMCP
                   │   MCP server   │
                   └────────┬────────┘
                            │
                   ┌────────▼────────┐
                   │  PostgreSQL 18  │  Docker · port 5433
                   │  + pgvector     │  tsvector + pgvector
                   └─────────────────┘

claude.ai (remote, HTTP/SSE)

claude.ai  →  https://mcp.retavyn.com  →  Cloudflare edge (TLS)
           →  cloudflared tunnel  →  retavyn :8765  →  PostgreSQL :5433

OAuth flow: claude.ai opens /authorize, user authenticates, server issues a JWT, claude.ai uses it as a Bearer token on all subsequent MCP calls.

Search internals

When you call recall("billing pipeline"), retavyn runs two passes and merges the results:

Full-text search — tsvector @@ to_tsquery('billing & pipeline'), ranked by ts_rank
Semantic search — cosine distance between the query embedding and stored embeddings via pgvector (embedding <=> $1 < threshold)
Results are deduplicated and returned ranked by combined score

Embeddings are generated via OpenAI text-embedding-3-small or Cohere embed-english-v3.0 (configurable). Memories without embeddings fall back to full-text only.

MCP tools

Tool	Description
`remember`	Store a memory with optional category tag
`recall`	Hybrid full-text + semantic search across memories
`update_memory`	Edit an existing memory by ID
`forget`	Delete a memory by ID
`forget_path`	Delete all memories ingested from a file or directory path
`ingest_path`	Bulk-import a file or directory tree as memories
`backfill_embeddings`	Generate embeddings for memories that don't have them
`ask_infra`	Ask a DevOps question — runs a full agent loop (memory search + live gcloud) and returns a synthesized answer

ask_infra

ask_infra is an agent embedded inside retavyn. When called, it spins up its own Claude tool-use loop with two tools — recall_memory (hybrid search over your retavyn memories) and run_gcloud (read-only live GCP queries) — iterates until it has a complete answer, then returns it as a single response.

From Claude Code's perspective it's one tool call. Under the hood it's a full agent making multiple passes across memory and live infrastructure state before synthesizing an answer.

Example questions:

"What load balancer setup do we use for Cloud Run services?"

"Which GKE clusters are running in prod right now?"

"How do we handle Cloud SQL private service connect?"

The agent is also available as a standalone CLI — see infra-agent/README.md.

Setup

Guide	What it covers
INSTALL.md	Local setup — run retavyn on your machine with Claude Code
SERVER.md	Remote server — deploy to a VM for claude.ai and cross-machine access

Environment variables

Variable	Default	Description
`MEMORY_DB_HOST`	`localhost`	PostgreSQL host
`MEMORY_DB_PORT`	`5433`	PostgreSQL port
`MEMORY_DB_NAME`	`retavyn`	Database name
`MEMORY_DB_USER`	`claude`	Database user
`MEMORY_DB_PASSWORD`	`claude`	Database password
`MEMORY_TRANSPORT`	`stdio`	`stdio` or `streamable-http`
`MEMORY_HOST`	`0.0.0.0`	Bind address (HTTP mode)
`MEMORY_PORT`	`8765`	Port (HTTP mode)
`OAUTH_SECRET`	—	JWT signing secret (HTTP mode)
`OAUTH_PASSWORD`	—	Auth password for browser flow (HTTP mode)
`OPENAI_API_KEY`	—	For OpenAI embeddings (optional)
`COHERE_API_KEY`	—	For Cohere embeddings (optional)

Documentation

File	Contents
INSTALL.md	Local install: setup.sh, MCP config, hooks
SERVER.md	Remote deploy: GCE VM, Cloudflare Tunnel, OAuth, claude.ai
TUTORIAL.md	First memory → first recall → journaling
API.md	Complete tool reference, search internals, advanced usage

CLI commands

python main.py          # start MCP server (stdio)
python main.py dump     # export all memories to ~/.claude/retavyn-cache.md
python main.py remember <content> [category]  # store a memory from the CLI
python main.py health   # check DB connection and memory count
python main.py ingest <path> [category]  # bulk ingest a file or directory

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured