agent-memory-mcp
Gives AI coding assistants persistent memory by storing, semantically searching, and organizing notes on Cloudflare's free tier.
README
agent-memory-mcp
Give your AI coding assistant a long-term memory.
AI assistants like Claude Code, Cursor, and OpenCode forget everything between sessions. This fixes that. agent-memory-mcp is a small server you deploy to your own Cloudflare account that stores memories, searches them by meaning, and keeps them organized -- so your AI gets smarter the longer you use it.
It runs entirely on Cloudflare's free tier. Your data never leaves your account.
What can you do with it?
<picture> <source media="(prefers-color-scheme: dark)" srcset="docs/use-cases.svg"> <img alt="Use cases: remember preferences, track learnings, know your team, search past sessions, project context, self-improving memory" src="docs/use-cases.svg" width="100%"> </picture>
Some real examples:
- "Always use pnpm, not npm" -- Tell your AI once. It remembers in every future session.
- "How did I fix that CORS bug last week?" -- Search past conversations by meaning, not keywords.
- "Alice owns the auth service" -- Your AI knows who to ask about what.
- "We decided to use REST, not GraphQL" -- Project decisions persist across sessions.
- "Port 8787 is already used by wrangler" -- Your AI won't make the same mistake twice.
How it works
<picture> <source media="(prefers-color-scheme: dark)" srcset="docs/how-it-works.svg"> <img alt="How it works: Write, Store, Index, Search" src="docs/how-it-works.svg" width="100%"> </picture>
- Write -- Your AI saves a note (a preference, a lesson, a decision).
- Store -- The file is saved to R2 with full version history. You can roll back any change.
- Index -- Workers AI turns the text into a vector embedding and adds it to a searchable index.
- Search -- Later, your AI (or you) can find that memory by asking a question in plain English. Semantic search matches by meaning, not exact words.
Recent memories rank higher than old ones automatically.
Architecture
<picture> <source media="(prefers-color-scheme: dark)" srcset="docs/architecture.svg"> <img alt="Architecture diagram showing AI assistants connecting to a Cloudflare Worker backed by R2, Durable Objects, and Workers AI" src="docs/architecture.svg" width="100%"> </picture>
Three Cloudflare services, one Worker:
| Component | What it does |
|---|---|
| R2 | Stores your memory files with version history |
| Durable Object | Runs the HNSW vector index + SQLite for semantic search |
| Workers AI | Generates embeddings (bge-m3) and powers the daily reflection |
Quick start
Option A: One-click deploy
Click the button, follow the prompts, and you'll have a running server in under 2 minutes:
After deploying, set your auth token:
npx wrangler secret put MEMORY_AUTH_TOKEN
# Enter a secure random token when prompted
Option B: Clone and deploy manually
git clone https://github.com/jonnyparris/agent-memory-mcp.git
cd agent-memory-mcp
npm install
# Create your R2 bucket
npx wrangler r2 bucket create agent-memory
# Set your auth token
npx wrangler secret put MEMORY_AUTH_TOKEN
# Enter a secure random token when prompted
# Deploy
npm run deploy
Connect your AI assistant
Once deployed, connect your AI assistant to the server. Replace YOUR_SUBDOMAIN with your Cloudflare Workers subdomain (find it in the Cloudflare dashboard under Workers & Pages).
Claude Code
export MEMORY_AUTH_TOKEN="your-secret-token"
claude mcp add --transport http agent-memory \
https://agent-memory-mcp.YOUR_SUBDOMAIN.workers.dev/mcp \
--header "Authorization: Bearer $MEMORY_AUTH_TOKEN"
Cursor
Go to Settings > MCP Servers > Add:
- URL:
https://agent-memory-mcp.YOUR_SUBDOMAIN.workers.dev/mcp - Headers:
Authorization: Bearer YOUR_TOKEN
OpenCode
Add to .opencode/opencode.json:
{
"mcp": {
"agent-memory": {
"type": "remote",
"url": "https://agent-memory-mcp.YOUR_SUBDOMAIN.workers.dev/mcp",
"headers": {
"Authorization": "Bearer {env:MEMORY_AUTH_TOKEN}"
}
}
}
}
Any MCP-compatible client
The server speaks the standard Model Context Protocol. Any MCP client can connect via HTTP with a Bearer token header.
Available tools
The server exposes 23 MCP tools. Your AI assistant discovers and uses them automatically -- you don't need to call them yourself.
Core memory
| Tool | What it does |
|---|---|
read |
Read one file or up to 50 files from memory (pass a string or array of paths) |
write |
Save a file (auto-indexes for search; extracts tags: frontmatter for filtering). Pass wait_for_index: false to defer the embedding update for sub-second response, or detect_overlaps: false to skip the post-write similarity search |
write_many |
Write up to 50 files in one call. R2 writes run in parallel and embedding updates are issued concurrently; defaults to detect_overlaps: false since bulk callers usually know what they're writing |
list |
List files in a directory (optionally filter by tags) |
list_tags |
List all tags with the file count for each |
search |
Find memories by meaning (semantic search; optionally filter by tags) |
history |
See previous versions of a file |
rollback |
Restore a file to an earlier version |
get_backlinks |
List files that link to a target via [[wikilinks]] |
execute |
Run JavaScript queries against your memory (trust notes) |
Conversations
| Tool | What it does |
|---|---|
index_conversations |
Import past AI sessions for search |
search_conversations |
Search across past conversations by meaning |
expand_conversation |
Get full context around a search result |
conversation_stats |
See how many conversations are indexed |
Reminders
| Tool | What it does |
|---|---|
schedule_reminder |
Set a one-time or recurring reminder |
check_reminders |
Poll for fired reminders (called on startup) |
list_reminders |
List all active reminders |
remove_reminder |
Delete a reminder |
Reflection
| Tool | What it does |
|---|---|
list_pending_reflections |
See proposed memory improvements |
apply_reflection_changes |
Apply a suggested improvement |
archive_reflection |
Dismiss a suggestion |
Recommended memory structure
You can organize your memory however you like. Here's a structure that works well:
memory/
├── learnings.md # Lessons learned, gotchas, corrections
├── preferences.md # Your coding style, tool preferences
├── people.md # Teammates, roles, availability
├── projects.md # Active projects, architecture decisions
│
├── patterns/ # Reusable patterns and templates
│ ├── git.md
│ ├── code-review.md
│ └── debugging.md
│
├── workload/ # Current tasks and priorities
│ ├── active.md
│ ├── backlog.md
│ └── archive/
│
└── archive/ # Old context you might need someday
Scheduled reflection
The server includes an automated self-improvement system. Every day at 6am UTC, it reviews your memory files and cleans them up.
Quick scan (GLM 4.7 Flash) catches simple issues -- typos, broken formatting, duplicate entries -- and fixes them automatically.
Deep analysis (Moonshot Kimi K2.6, 262k context, agentic) looks for contradictions, outdated information, gaps, orphaned files, and missing cross-references. It proposes changes for you to review, including adding [[wikilinks]] where files clearly relate but don't reference each other. It uses the backlink index to understand which files are hubs and which are orphans before proposing merges or deletes.
Override the defaults with REFLECTION_MODEL and REFLECTION_MODEL_FAST in wrangler.jsonc or as secrets if you want to try a different pair.
You get a notification summary after each run. You can also trigger it manually:
curl -X POST "https://your-worker.workers.dev/reflect" \
-H "Authorization: Bearer YOUR_TOKEN"
What the scheduled reflection is not
The cron-driven reflection is a lightweight, autonomous scan. It runs unattended on a budget of a few iterations and writes proposals to memory/reflections/pending/{date}.md (empty proposals get archived to memory/reflections/archive/{date}.md to make zero-output runs visible).
This is not the same as a deep reflection workflow you might drive from your agent — e.g. a /nightly-reflect slash command that pulls your entire week of activity (calendar, git log, scratch notes, chat history) and writes a multi-section improvement plan. The cron has none of that context. It only sees what's already in memory.
Concretely:
- Cron output lives in
memory/reflections/pending/andmemory/reflections/archive/. Expect terse summaries and small auto-fixes. - Agent-driven deep reflection output should live somewhere else (e.g.
memory/workload/plans/{date}-improvement-proposals.md). Don't conflate the two — a healthymemory/reflections/archive/does not mean your weekly reflection ran.
If your agent has a separate deep-reflection workflow, watch its output directory directly. The cron is a janitor; the agent is the architect.
The execute tool
The execute tool runs arbitrary JavaScript against your memory. It's fast and convenient for complex queries (group files by tag, aggregate word counts, etc.) but it's not a sandbox:
- Code runs in the same V8 isolate as the Worker.
- It has access to global
fetch,crypto, and other Web APIs. - It runs with the Worker's CPU limit as the only time bound (plus a 10s wall-clock timeout from the tool).
- It does NOT have access to
envbindings, your auth token, or other secrets — those never touch the global scope.
Guidance:
- Only use
executewhen you trust who's calling it. The auth token protects the MCP endpoint; anyone with the token can run code. - Don't expose this MCP to untrusted users without stripping the
executetool first. - Use
search+readinstead when you can — they're safer and usually faster.
Cost
Runs entirely within Cloudflare's free tier for personal use:
| Service | Free tier limit | Typical usage | Cost |
|---|---|---|---|
| R2 Storage | 10 GB/month | ~1 MB | $0 |
| R2 Operations | 10M reads, 1M writes | ~3K reads, ~600 writes | $0 |
| Workers | 10M requests/month | ~6K | $0 |
| Workers AI | 10K neurons/day | ~60 | $0 |
| Durable Objects | 100K requests/day | ~200 | $0 |
Total: $0/month
Development
npm install # Install dependencies
npm run dev # Run locally
npm test # Run all tests
npm run test:unit # Unit tests only
npm run deploy # Deploy to Cloudflare
Migrating existing memory files
If you have local memory files you want to upload:
npm run migrate # Upload local files to your deployed server
npm run export # Download all files from the server to local disk
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.