adl-context-collector
MCP server providing persistent memory for LLMs via semantic search over captures, structured facts with temporal tracking, and document ingestion. Enables any MCP-compatible client to maintain a searchable knowledge base across sessions.
README
adl-context-collector
Your context is your most valuable asset when working with LLMs. But it's locked inside conversation histories that disappear, scattered across tools, and tied to whichever provider you're using today.
context-collector is a personal MCP server that gives any LLM persistent memory — semantic search over your captures, structured facts with temporal tracking, and document ingestion. It works with Claude, ChatGPT, Gemini, or any MCP-compatible client. Your knowledge stays yours.
By default it uses OpenAI for embeddings and metadata extraction. Optionally, run everything locally with Ollama — no data leaves your machine.
Want to get started fast? Fork the adl-context-vault starter template.
Prerequisites
- Node.js 20+
- PostgreSQL with the pgvector extension
- OpenAI API key (or Ollama for fully local operation)
- An MCP client — Claude Code, Claude Desktop, or any app that supports the Model Context Protocol
Installing PostgreSQL with pgvector
macOS (Homebrew):
brew install postgresql@17
brew install pgvector
brew services start postgresql@17
Linux (Ubuntu/Debian):
sudo apt install postgresql postgresql-contrib
sudo apt install postgresql-17-pgvector
Docker:
docker run -d --name context-db \
-e POSTGRES_PASSWORD=yourpassword \
-p 5432:5432 \
pgvector/pgvector:pg17
Quick Start
1. Clone context-collector
git clone https://github.com/ammonsdatalabs/adl-context-collector.git
cd adl-context-collector
npm install
2. Create a database
createdb my_context
psql my_context -c "CREATE EXTENSION IF NOT EXISTS vector"
psql my_context < sql/schema.sql
3. Set up a vault
A vault is a directory that holds your configuration and data. The simplest way to start is to fork the adl-context-vault template. Or create one manually:
mkdir my-vault && cd my-vault
Create a .env file:
DATABASE_URL=postgresql://localhost:5432/my_context
OPENAI_API_KEY=sk-your-key-here
Optionally create a config file for custom domains, categories, and tool descriptions. See Configuration below.
4. Register the MCP server
Claude Code:
claude mcp add my-vault \
-e DOTENV_CONFIG_PATH=/absolute/path/to/my-vault/.env \
-- npx tsx /absolute/path/to/adl-context-collector/src/index.ts
Claude Desktop — add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"my-vault": {
"command": "npx",
"args": ["tsx", "/absolute/path/to/adl-context-collector/src/index.ts"],
"env": {
"DOTENV_CONFIG_PATH": "/absolute/path/to/my-vault/.env"
}
}
}
}
5. Start using it
Restart your MCP client. You now have 8 tools available:
> search_context("what did we decide about the API redesign?")
> save_fact(domain: "project", key: "api_redesign_status", value: "approved, starting sprint 12")
> capture_thought("We should use pagination instead of cursor-based — simpler for our use case")
Configuration
context-collector uses a JSON config file with sensible defaults. You only need to override what you want to change.
Minimal config
No config file needed — defaults work out of the box if you have OPENAI_API_KEY set.
Custom config
Create a JSON file and point to it via your .env:
COLLECTOR_CONFIG=/path/to/vault-config.json
Example config (see config.example.json for the full version):
{
"serverName": "my-vault",
"categories": ["status", "cost", "date", "contact", "preference"],
"tools": {
"search_context": {
"description": "Search my project knowledge base using semantic similarity."
},
"save_fact": {
"keyExamples": ["deploy_status", "team_velocity", "sprint_end_date"],
"currencies": ["USD", "EUR"],
"units": ["weekly", "monthly", "annual"]
}
}
}
You only need to include fields you want to override — everything else inherits from defaults via deep merge.
Embedder configuration
By default, context-collector uses OpenAI's text-embedding-3-small for embeddings. To use a different provider or model:
{
"embedder": {
"url": "https://api.openai.com/v1/embeddings",
"model": "text-embedding-3-small",
"dimensions": 1536,
"apiKey": "env:OPENAI_API_KEY"
}
}
url— any OpenAI-compatible embeddings endpointmodel— the model name to requestdimensions— vector dimensions (must match the model's output)apiKey—"env:VAR_NAME"reads from an environment variable,nullskips authentication (for local servers)
Metadata extraction
By default, context-collector uses GPT-4o-mini to extract metadata (type, domain, topics, people, action items, dates) from ingested content. To disable or configure:
{
"metadataExtractor": {
"enabled": true,
"url": "https://api.openai.com/v1/chat/completions",
"model": "gpt-4o-mini",
"apiKey": "env:OPENAI_API_KEY"
}
}
Set "enabled": false to skip metadata extraction entirely — content is still searchable via embeddings, but won't have auto-extracted topics, domains, or action items.
Custom domains and capture types
Via config .env:
COLLECTOR_DOMAINS=platform,taxonomy,ingestion,operations
COLLECTOR_CAPTURE_TYPES=thought,decision,analysis,note,meeting,insight,plan,reference
Or use the defaults: general, project, reference for domains and 8 capture types.
Tools Reference
| Tool | Description |
|---|---|
search_context |
Semantic similarity search across all captures |
lookup_fact |
Look up a specific fact by key |
search_facts |
Search or filter structured facts |
capture_thought |
Save a thought, decision, or note (auto-embedded and metadata-extracted) |
save_fact |
Save or update a structured fact with temporal tracking |
list_recent |
List recently captured thoughts |
context_stats |
Knowledge base statistics (counts, domains, last activity) |
ingest_document |
Ingest a file or directory (Markdown, PDF, plain text) |
Ingestion
Via MCP tool
From any conversation with your LLM:
> ingest_document(path: "/path/to/my-vault/rawdocs/meeting-notes.md")
> ingest_document(path: "/path/to/my-vault/rawdocs/", recursive: true)
Via CLI
# Single file
env $(grep -v '^#' /path/to/vault/.env | xargs) npm run ingest -- /path/to/document.md
# Directory
env $(grep -v '^#' /path/to/vault/.env | xargs) npm run ingest -- /path/to/rawdocs/
GitHub issues
env $(grep -v '^#' /path/to/vault/.env | xargs) npm run ingest-github -- --repo owner/repo --limit 50
Supported formats
- Markdown (
.md) — split by headings - PDF (
.pdf) — split by pages with cross-page sentence handling - Plain text (
.txt) — split by paragraphs
Documents are chunked, embedded, deduplicated by content hash, and metadata-extracted automatically.
Multiple Vaults
You can run separate vaults for different contexts (personal, work, side projects) — each with its own database, config, and domains — all sharing the same context-collector installation.
Example setup
~/vaults/personal/
.env → DATABASE_URL=...personal_db, COLLECTOR_CONFIG=...
vault-config.json → serverName: "personal", domains: finance,family,travel
~/vaults/work/
.env → DATABASE_URL=...work_db, COLLECTOR_CONFIG=...
vault-config.json → serverName: "work", domains: platform,ops,team
Register each separately:
# Personal vault (project-scoped — only available when working in that directory)
cd ~/vaults/personal
claude mcp add personal-vault \
-e DOTENV_CONFIG_PATH=$PWD/.env \
-- npx tsx /path/to/adl-context-collector/src/index.ts
# Work vault (user-scoped — available everywhere)
claude mcp add -s user work-vault \
-e DOTENV_CONFIG_PATH=~/vaults/work/.env \
-- npx tsx /path/to/adl-context-collector/src/index.ts
Each vault appears as a separate MCP server with its own tool prefix (personal-vault - search_context, work-vault - save_fact, etc.).
Local Setup (Ollama)
To keep all data on your machine — no API calls to OpenAI — use Ollama as your embedder and (optionally) metadata extractor.
1. Install Ollama and pull models
# Install from https://ollama.com
ollama pull nomic-embed-text # embeddings (required)
ollama pull qwen2.5:7b # metadata extraction (optional, needs 16GB+ RAM)
2. Configure your vault
Add to your vault's config JSON:
{
"embedder": {
"url": "http://localhost:11434/v1/embeddings",
"model": "nomic-embed-text",
"dimensions": 768,
"apiKey": null
},
"metadataExtractor": {
"enabled": true,
"url": "http://localhost:11434/v1/chat/completions",
"model": "qwen2.5:7b",
"apiKey": null
}
}
Important: Use Ollama's OpenAI-compatible endpoints (/v1/...), not the native API endpoints (/api/...).
Remove OPENAI_API_KEY from your .env — it's no longer needed.
3. Generate embeddings for the new model
env $(grep -v '^#' /path/to/vault/.env | xargs) npm run migrate-embeddings
This generates embeddings for all captures using the configured model. Existing OpenAI embeddings are preserved — you can switch back at any time by changing the config.
Hardware requirements
- 8GB RAM — embeddings only (metadata extraction disabled)
- 16GB+ RAM — embeddings + metadata extraction with a 7B model
Switching back to OpenAI
Change your config's embedder back to OpenAI settings. If you previously had OpenAI embeddings, they're still in the database — search works immediately. No re-embedding needed.
Security
context-collector runs as a local stdio process — it has no network listener and no authentication layer. The trust boundary is your local operating system user account. If an attacker can run code as your user, they already have access to everything the MCP server can see.
This is a meaningful improvement over storing context in cloud conversation histories — a provider account compromise no longer exposes your entire knowledge base. But local-only is not a sandbox. The strongest practical controls are OS-level.
Recommendations
-
Enable full-disk encryption — FileVault (macOS) or LUKS (Linux). Protects against physical theft.
-
Restrict vault file permissions — your
.envfiles contain database credentials and API keys:chmod 600 /path/to/vault/.env chmod 600 /path/to/vault/vault-config.json -
Bind PostgreSQL to localhost only — verify
listen_addresses = 'localhost'inpostgresql.conf. Use a real password, not a default. -
Keep sensitive vaults project-scoped — register with
claude mcp add(default, project-scoped) rather thanclaude mcp add -s user(global). Project-scoped servers are only available when Claude is running in that directory. -
Go local for embeddings — by default, ingested content is sent to OpenAI for embedding. If nothing leaving the machine matters, switch to Ollama.
-
Treat MCP queries as trusted execution — when an LLM queries your vault, it sees what you'd see. This is the point, but be aware that any MCP client you connect has full read/write access to your vault's database.
Migration
If you have an existing installation that used the old schema (embeddings stored directly in the captures table), run the schema migration:
# Back up first
pg_dump your_database > backup.sql
# Run migration (defaults to your current embedder config for legacy model info)
env $(grep -v '^#' /path/to/vault/.env | xargs) npm run migrate-schema
# Or specify the legacy model explicitly
env $(grep -v '^#' /path/to/vault/.env | xargs) npm run migrate-schema -- \
--legacy-url "https://api.openai.com/v1/embeddings" \
--legacy-model "text-embedding-3-small" \
--legacy-dims 1536
This creates the capture_embeddings table, copies existing embeddings with their model metadata, verifies the copy, and drops the old column.
License
ISC
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.