MCP Servers

Archive Search

Enables semantic search across conversation archives via MCP, allowing AI clients to retrieve relevant past conversations using vector embeddings and text fallback.

README

Archive Search

A Cloudflare Worker that provides semantic search across conversation archives via the Model Context Protocol (MCP). Give your AI companion searchable memory of past conversations — accessible from anywhere.

What it does

Stores conversation chunks in Cloudflare D1
Generates embeddings with Workers AI (BGE model, 768 dimensions)
Indexes embeddings in Cloudflare Vectorize for semantic search
Exposes search via MCP so any compatible AI client can query it
Falls back to text search when vector results are empty

Architecture

AI Client (Claude, etc.)
    ↓ MCP over HTTP
Cloudflare Worker (archive-search)
    ├── D1 (chunk storage)
    ├── Vectorize (semantic index)
    └── Workers AI (embeddings)

Everything runs on Cloudflare's free tier. No external dependencies, no local servers to keep running.

Setup

1. Prerequisites

Cloudflare account
Wrangler CLI installed and authenticated
Node.js 18+

2. Clone and configure

git clone https://github.com/your-username/archive-search.git
cd archive-search
cp wrangler.toml.example wrangler.toml

3. Create Cloudflare resources

# Create D1 database
wrangler d1 create archive-search
# Copy the database_id into your wrangler.toml

# Create Vectorize index
wrangler vectorize create archive-search-vectors --dimensions=768 --metric=cosine

4. Set your API key

Generate a key and store it as a Cloudflare secret (never commit it to your repo):

# Generate a random key
openssl rand -hex 16

# Deploy first (so the Worker exists)
wrangler deploy

# Then set the secret
echo "your-generated-key" | wrangler secret put API_KEY

5. Apply migrations and deploy

# Apply database migrations
wrangler d1 migrations apply archive-search --remote

# Deploy the worker
wrangler deploy

6. Ingest your conversations

The migration script reads markdown files, chunks them (2000 chars with 200 char overlap), and uploads them to the worker for embedding and indexing.

VAULT_PATH="/path/to/your/conversations" \
WORKER_URL="https://archive-search.your-subdomain.workers.dev" \
API_KEY="your-api-key" \
node scripts/migrate.js

Your conversations should be .md files in any directory structure. The script discovers them recursively.

MCP Tools

`search_archive`

Semantic search across your conversation archive.

Parameter	Type	Required	Description
`query`	string	yes	What you're looking for, conceptually
`n_results`	integer	no	Results to return (default 5, max 20)

Example: Searching for "moments of vulnerability" will find passages about opening up, being honest about feelings, showing weakness — even if those exact words aren't used.

`get_archive_stats`

Returns total chunks indexed and source file count.

`repair_archive`

Scans the database page by page, checks which chunks are missing vector embeddings, and re-embeds only those. Run this after migration to patch gaps caused by rate limits during ingestion.

Parameter	Type	Required	Description
`batch_size`	integer	no	Chunks to scan per run (default 200, max 200)

The tool tracks its scan position — run it multiple times and it picks up where it left off. Once it reaches the end, it reports completion and resets. Only chunks with missing vectors get re-embedded, so repeated runs are fast when everything is healthy.

Connecting to your AI client

This is a cloud MCP server — once deployed, it's accessible from any client that supports remote MCP connections. No local servers to run.

Claude Desktop (Remote MCP)

Open Claude Desktop settings
Add a new remote MCP server (connector)
Use the URL with your API key in the path:

https://archive-search.your-subdomain.workers.dev/mcp/your-api-key

That's it — no headers needed, no local config files. The server is available from any device where you use Claude.

Claude Code

Add to your .claude.json or MCP settings:

{
  "mcpServers": {
    "archive-search": {
      "type": "http",
      "url": "https://archive-search.your-subdomain.workers.dev/mcp",
      "headers": {
        "Authorization": "Bearer your-api-key"
      }
    }
  }
}

Other MCP clients

Any MCP-compatible client can connect via:

Path auth: POST https://archive-search.your-subdomain.workers.dev/mcp/your-api-key
Header auth: POST https://archive-search.your-subdomain.workers.dev/mcp with Authorization: Bearer your-api-key

HTTP Endpoints

Endpoint	Method	Auth	Description
`/mcp`	POST	Bearer	MCP protocol handler
`/mcp/TOKEN`	POST	Path	MCP protocol handler (token in URL)
`/ingest`	POST	Bearer	Bulk upload chunks
`/health`	GET	No	Health check
`/stats`	GET	Bearer	Archive statistics

Ingest format

POST /ingest accepts:

{
  "chunks": [
    {
      "source_file": "2025/07/conversation-title.md",
      "chunk_index": 0,
      "total_chunks": 5,
      "content": "The actual text content...",
      "era": "2025-07",
      "conversation_title": "conversation-title"
    }
  ]
}

How it works

Chunking: Conversations are split into 2000-character chunks with 200-character overlap to preserve context at boundaries
Embedding: Each chunk is embedded using @cf/baai/bge-base-en-v1.5 (768-dimensional vectors)
Indexing: Embeddings are stored in Cloudflare Vectorize with metadata linking back to the D1 record
Searching: Query text is embedded with the same model, then matched against the index using cosine similarity
Fallback: If no vector matches are found, a text-based LIKE search runs against D1

Security and privacy

If you're using this to store personal conversations, you should understand exactly where your data lives and who can access it.

What gets stored and where

Your data lives in three Cloudflare services:

Service	What it holds	Encryption at rest
D1 (database)	Full text of every conversation chunk, file paths, timestamps	AES-256-GCM
Vectorize (vector index)	Embedding vectors + metadata (file paths, 200-char text previews)	AES-256-GCM (stored on R2)
Workers AI	Nothing — text is processed for embeddings and not retained	N/A

All data is encrypted in transit (TLS) and at rest (AES-256-GCM). Encryption and decryption are automatic.

Cloudflare is not zero-knowledge

This is the most important thing to understand. Cloudflare manages the encryption keys. Your data is encrypted at rest, but Cloudflare holds the keys — meaning a sufficiently privileged employee or a legal compulsion could theoretically result in data access.

Access is restricted by organizational controls:

Employees require unique credentials with hardware-token MFA
Least-privilege and zero-trust authorization
All personnel with data access are under contractual confidentiality obligations
Cloudflare's DPA commits to never providing encryption keys or customer data feeds to law enforcement

This is strong protection through policy and contract, but it is not the same as technical impossibility. If you need zero-knowledge encryption for your data, this architecture is not the right fit — consider a local deployment instead (see vault-archive-product for a fully local alternative using ChromaDB).

Workers AI and your text

When your text is sent to Workers AI for embedding generation:

It is not stored or logged by Cloudflare
It is not used for training any models — Cloudflare explicitly commits to this
Processing runs on Cloudflare's own GPU network, not sent to third parties
The embedding model (bge-base-en-v1.5) is an open-source model hosted on Cloudflare hardware

Data location

D1 automatically places your database near where you created it. You can set a jurisdiction at creation time for data residency:

# Keep data in the EU
wrangler d1 create archive-search --location=eu

# FedRAMP-compliant locations
wrangler d1 create archive-search --location=fedramp

Jurisdictions are immutable after creation. If you need EU data residency, set it when you create the database — you can't add it later.

Authentication

The Worker uses a single API key for all authenticated endpoints. The key is stored as a Cloudflare secret (encrypted, never visible in your code or dashboard). Two auth methods are supported:

Bearer token: Authorization: Bearer your-key header
Path token: /mcp/your-key in the URL

Important: Never commit your API key to version control. The included .gitignore excludes wrangler.toml (which may contain your database ID), but your API key should always be set via wrangler secret put API_KEY.

What this project does NOT include

No rate limiting per caller — anyone with your API key can make unlimited requests (within Cloudflare's free tier limits of 100k requests/day)
No audit logging — searches are not logged (which is good for privacy, but means you can't detect unauthorized access)
No key rotation mechanism — to rotate, generate a new key and run wrangler secret put API_KEY again
CORS is permissive (Access-Control-Allow-Origin: *) — appropriate for MCP clients, but means the API is callable from any origin with the key

Cloudflare's certifications

Cloudflare maintains SOC 2 Type II, ISO 27001, ISO 27018 (cloud privacy), ISO 27701 (privacy information management), and PCI DSS certifications. Full details at Cloudflare Trust Hub.

The honest summary

Your conversation data is encrypted at rest and in transit, processed on Cloudflare's infrastructure (not sent to third parties), and not used for training. Cloudflare is contractually and organizationally restricted from accessing it. But they hold the encryption keys, so "can't access" is a policy guarantee, not a cryptographic one. For most personal use cases this is solid protection — comparable to storing data in any major cloud provider. If your threat model requires zero-knowledge encryption, host locally instead.

Cost

Check the pricing pages for each service to calculate your own costs:

Workers AI pricing — embedding generation (this is where ingestion cost lives)
D1 pricing — database storage and reads
Vectorize pricing — vector index queries
Workers pricing — request handling

How much does it actually cost?

Cloudflare measures AI compute in neurons. The free tier gives you 10,000 neurons per day (resets at 00:00 UTC). Embedding models are extremely cheap because they're small, fast operations — much cheaper than text generation.

The math for embeddings (bge-base-en-v1.5):

Cost: 6,058 neurons per 1,000,000 input tokens
A 2000-character chunk is ~500 tokens
One chunk costs: 500 ÷ 1,000,000 × 6,058 = ~0.003 neurons

That's three thousandths of a neuron per chunk. Which means:

Archive size	Neurons used	% of free daily limit
10,000 chunks	~30 neurons	0.3%
20,000 chunks	~63 neurons	0.6%
50,000 chunks	~152 neurons	1.5%
100,000 chunks	~303 neurons	3%

You can embed your entire archive in a single session on the free tier. Even 100,000 chunks uses only 3% of the daily free allocation. We tested this ourselves — 20,755 chunks embedded in one hour, on the free plan, using under 1% of the daily limit.

The other services are similarly generous for this use case:

Workers: 100,000 requests/day (ingestion + searches)
D1: 5M rows read/day, 5GB storage
Vectorize: 30M queries/month

In practice, this project runs entirely for free — both initial ingestion and ongoing searches. The repair_archive tool exists as a safety net in case any embeddings fail during ingestion (e.g., due to network errors or temporary rate limits), but you should not need to run it across multiple days.

On the free plan, if you somehow exceed 10,000 neurons/day, requests fail with an error — you will never be surprised with a bill. On the Workers Paid plan ($5/month), overages are billed at $0.011 per 1,000 neurons, but you'd need to embed millions of chunks in a single day to even notice.

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured