MCP Servers

2BToRePensieve

Builds a persistent knowledge graph from notes and conversations, enabling semantic search, entity exploration, and GTD task management from any MCP-compatible AI assistant.

README

2BToRePensieve

Status (2026-03-24): Active development. See Release Notes for the latest changes.

A cloud-hosted personal knowledge graph you can talk to from any AI assistant.

Second Brain + Total Recall + Pensieve — capture everything, forget nothing, recall instantly.

2BToRePensieve builds a persistent knowledge graph from your notes, conversations, emails, YouTube videos, and Notion pages. It extracts entities, relationships, and observations automatically, then makes everything searchable via semantic search with LLM reranking — accessible from ChatGPT, Claude, Cursor, or any MCP-compatible client.

How It Works

┌─────────────────────────────────────────────────────────────────┐
│                        INPUT CHANNELS                           │
│  ChatGPT  Claude  Notion  YouTube  Telegram  Email  Local Files │
└──────────────────────────┬──────────────────────────────────────┘
                           │
                    ┌──────▼──────┐
                    │   Ingest    │  LLM extraction + embedding
                    │  Pipeline   │  (2-3 LLM calls per chunk)
                    └──────┬──────┘
                           │
              ┌────────────▼────────────────┐
              │      Supabase + pgvector    │
              │  ┌────────┐  ┌───────────┐  │
              │  │Entities│──│ Relations │  │
              │  └────┬───┘  └───────────┘  │
              │  ┌────▼──────┐ ┌─────────┐  │
              │  │Observations│ │  Tasks  │  │
              │  └────────────┘ └─────────┘  │
              └────────────┬────────────────┘
                           │
                    ┌──────▼──────┐
                    │ MCP Server  │  12 tools, LLM reranking
                    └──────┬──────┘
                           │
              ┌────────────▼────────────────┐
              │       ACCESS POINTS         │
              │  ChatGPT  Claude  Cursor    │
              │  Telegram Bot  Any MCP app  │
              └─────────────────────────────┘

Features

Knowledge Graph — Entities, relations, and observations extracted automatically from any text
Semantic Search — pgvector cosine similarity + LLM reranking for high-relevance results
12 MCP Tools — search, add thoughts, manage tasks, explore entities, view stats
GTD Task System — inbox/next/waiting/someday/done with priorities and projects
7 Input Channels — ChatGPT, Claude, Notion, YouTube, Telegram, Email, local files
5-Layer Dedup — Content hash, semantic similarity, entity name+type, relation edges, observation hash
Daily Sync — GitHub Actions for Notion, local Task Scheduler for YouTube (cloud IPs blocked by YouTube)
Batched Pipeline — 2-3 LLM calls + 2 embedding calls per chunk (not per entity)

Quick Start

1. Set up Supabase

Create a Supabase project (free tier works). Run the migrations in order:

# In Supabase SQL Editor, run each file in supabase/migrations/:
# 001_create_knowledge_graph.sql
# 002_add_stats_functions.sql
# 003_add_dedup_constraints.sql
# 004_add_tasks_and_sync.sql
# 005_add_search_similar_entities.sql

2. Set up OpenRouter

Create an OpenRouter account and add credits. Get your API key.

Default models (configurable):

Chat/Extraction: openai/gpt-4o-mini (~$0.15/1M input tokens)
Embeddings: openai/text-embedding-3-small (~$0.02/1M tokens)

3. Deploy Edge Functions

# Install Supabase CLI
npm i -g supabase

# Link your project
supabase link --project-ref your-project-ref

# Set secrets
supabase secrets set \
  OPENROUTER_API_KEY=sk-or-v1-your-key \
  OPEN_BRAIN_ACCESS_KEY=$(python -c "import secrets; print(secrets.token_hex(32))")

# Deploy all functions
supabase functions deploy ingest --no-verify-jwt
supabase functions deploy mcp-server --no-verify-jwt
supabase functions deploy telegram-capture --no-verify-jwt
supabase functions deploy email-capture --no-verify-jwt
supabase functions deploy slack-capture --no-verify-jwt

4. Connect Your AI Client

Claude Code / Cursor

Add to your MCP config (.claude/mcp.json or .cursor/mcp.json):

{
  "mcpServers": {
    "open-brain": {
      "type": "url",
      "url": "https://your-project.supabase.co/functions/v1/mcp-server",
      "headers": {
        "Authorization": "Bearer YOUR_ACCESS_KEY"
      }
    }
  }
}

ChatGPT

Use a ChatGPT MCP connector plugin. Set the server URL to:

https://your-project.supabase.co/functions/v1/mcp-server?key=YOUR_ACCESS_KEY

5. Install Python Dependencies

pip install supabase openai httpx python-dotenv yt-dlp youtube-transcript-api PyMuPDF

6. Configure Environment

cp .env.example .env
# Edit .env with your credentials

Documentation

Usage Guide — How to use the system day to day: searching, adding thoughts, managing tasks, importing content
Channel Setup — How to configure each input channel (Telegram, Email, Notion, YouTube, etc.)
Supabase Setup — Database and Edge Function setup
Architecture — Technical design and data flow

MCP Tools

Tool	Description
`search_brain`	Semantic search with LLM reranking
`get_entity`	Look up entity by name/ID with full context
`explore_neighborhood`	Traverse entity relations N hops deep
`add_thought`	Capture any content into the knowledge graph
`list_entities`	Browse entities by type or recency
`list_thoughts`	Browse recent captures with filters
`thought_stats`	Aggregate stats: counts, types, top entities
`add_task`	Create GTD task with priority/project/context
`list_tasks`	List tasks with status/category/project filters
`update_task`	Update any task field
`complete_task`	Mark task done
`get_source`	Find source content by title keyword

Connectors

Connector	Type	How
ChatGPT	Python CLI	Export conversations JSON, ingest via `chatgpt_conversations.py`
Claude	Python CLI	Export conversations JSON, ingest via `claude_conversations.py`
Notion	Python CLI + Cron	Syncs database pages with incremental cursor via `notion_database.py`
YouTube	Python CLI + Cron	Extracts transcripts from playlist videos via `youtube.py`
Telegram	Edge Function	Bot captures messages, searches brain, replies with context
Email	Edge Function	Resend inbound webhook captures emails + PDF attachments
Slack	Edge Function	Bot captures channel messages
Local Files	Python CLI	Bulk ingest .md/.txt files via `local_bulk.py` or watch folder via `local_sync.py`

Project Structure

2BToRePensieve/
├── open_brain/                    # Python package
│   ├── config.py                  # Environment-based configuration
│   ├── db.py                      # Supabase client + all DB operations
│   ├── embeddings.py              # Cloud (OpenRouter) + local (LM Studio) embeddings
│   ├── ingest.py                  # Core ingestion pipeline
│   ├── chunking.py                # Text chunking with sentence-boundary splitting
│   ├── extraction/
│   │   ├── extractor.py           # LLM knowledge extraction
│   │   ├── entity_resolver.py     # Batch entity resolution + merge confirmation
│   │   └── prompts.py             # LLM prompt templates
│   ├── connectors/
│   │   ├── chatgpt_conversations.py
│   │   ├── claude_conversations.py
│   │   ├── notion_database.py
│   │   ├── youtube.py
│   │   ├── local_bulk.py
│   │   ├── local_sync.py
│   │   ├── whatsapp_export.py
│   │   └── pdf_ingest.py
│   └── backup/
│       └── backup.py              # pg_dump + JSONL export
├── supabase/
│   ├── config.toml
│   ├── migrations/                # Run these in order
│   │   ├── 001_create_knowledge_graph.sql
│   │   ├── 002_add_stats_functions.sql
│   │   ├── 003_add_dedup_constraints.sql
│   │   ├── 004_add_tasks_and_sync.sql
│   │   └── 005_add_search_similar_entities.sql
│   └── functions/
│       ├── ingest/                # Universal ingestion Edge Function
│       ├── mcp-server/            # MCP protocol server (12 tools)
│       ├── telegram-capture/      # Telegram bot webhook
│       ├── email-capture/         # Resend inbound email webhook
│       └── slack-capture/         # Slack event webhook
├── scripts/
│   └── sync-youtube.ps1           # Local YouTube sync (Task Scheduler)
└── .github/
    └── workflows/
        └── daily-sync.yml         # Cron: Notion daily sync

Database Schema

6 tables, 3 RPC functions, pgvector HNSW indexes:

sources — Raw ingested content with content_hash dedup
entities — People, concepts, projects, tools, decisions, events (with embeddings)
relations — Directed edges between entities
observations — Facts, insights, decisions linked to entities (with embeddings)
tasks — GTD task system with embeddings for semantic search
sync_state — Cursor tracking for incremental connector sync

RPC functions:

search_knowledge — Union search across entities + observations + tasks
get_entity_context — Full entity context with relations, observations, tasks
search_similar_entities — Fast entity-only similarity search for ingestion
get_top_connected_entities — Most connected entities by relation count

Ingestion Pipeline

Each chunk goes through this optimized pipeline:

Dedup check — SHA-256 content hash (DB only)
Store source — Insert raw content (DB only)
Extract knowledge — 1 LLM call extracts entities, relations, observations
Batch embed entities — 1 API call for all entity texts
Search candidates — DB calls to search_similar_entities RPC
Batch merge confirmation — 0-1 LLM call for all merge candidates
Upsert entities — Create new or merge into existing (DB only)
Store relations — Dedup by (source, target, type) edge (DB only)
Batch embed observations — 1 API call for all observation texts
Dedup + store observations — Hash + semantic dedup (DB only)

Total: 2-3 LLM calls + 2 embedding calls per chunk.

Cost Estimate

With gpt-4o-mini + text-embedding-3-small via OpenRouter:

Activity	Estimated Cost
Ingest 100 pages/articles	~$0.10-0.30
Daily Notion sync (50 pages)	~$0.05-0.15
Daily YouTube sync (10 videos)	~$0.05-0.20
100 MCP searches with reranking	~$0.02-0.05
Telegram: 50 messages/day	~$0.03-0.08

Typical monthly cost: $5-15 for moderate personal use.

Daily Sync

Notion (GitHub Actions)

The included workflow runs daily at 6 AM UTC:

Syncs pages from a Notion database, 50 pages per run
Two-phase sync: re-ingests modified pages, then backfills un-ingested pages
Safe limit for the 6-hour GitHub Actions timeout: ~300 pages per run

Set these GitHub Actions secrets:

SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY
OPENROUTER_API_KEY
NOTION_API_TOKEN, NOTION_DATABASE_ID
TELEGRAM_BOT_TOKEN, TELEGRAM_NOTIFY_CHAT_ID (optional, for notifications)

YouTube (Local Task Scheduler)

Why not GitHub Actions? YouTube blocks transcript requests from cloud provider IPs (AWS, GCP, Azure). All GitHub Actions runners use cloud IPs, so every transcript fetch fails with RequestBlocked. See YouTube IP Blocking for details and alternatives.

YouTube sync runs locally via Windows Task Scheduler using your home IP:

# Register the scheduled task (run once)
$repoRoot = "C:\path	oBToRePensieve"
$scriptPath = Join-Path $repoRoot "scripts\sync-youtube.ps1"

$action = New-ScheduledTaskAction `
    -Execute "powershell.exe" `
    -Argument "-NoProfile -ExecutionPolicy Bypass -File `"$scriptPath`"" `
    -WorkingDirectory $repoRoot

$trigger = New-ScheduledTaskTrigger -Daily -At "6:00AM"

$settings = New-ScheduledTaskSettingsSet `
    -AllowStartIfOnBatteries `
    -DontStopIfGoingOnBatteries `
    -WakeToRun `
    -StartWhenAvailable `
    -ExecutionTimeLimit (New-TimeSpan -Hours 1)

Register-ScheduledTask `
    -TaskName "OpenBrain-YouTube-Sync" `
    -Description "Daily YouTube playlist sync for knowledge graph" `
    -Action $action `
    -Trigger $trigger `
    -Settings $settings

The -WakeToRun flag wakes the computer from sleep to run the sync, then it goes back to sleep.

Before running: Edit scripts/sync-youtube.ps1 and set your playlist URL.

Inspiration

Inspired by Nate B. Jones' Open Brain guide, which demonstrated the core idea: Supabase + OpenRouter + MCP to give every AI tool you use the same persistent memory via a single URL.

The problem is simple — your knowledge lives in too many places. Zotero, browser bookmarks, Notion, YouTube watch-later playlists, ChatGPT conversations, Claude chats, Slack threads, emails. None of them talk to each other, and none of them are accessible when you're working in a different tool.

2BToRePensieve takes the Open Brain concept and extends it from a Slack capture + 4 MCP tools into a full knowledge graph with:

Entity extraction and resolution — not just storing text, but building a graph of people, concepts, projects, and their relationships
7 input channels instead of just Slack — ChatGPT, Claude, Notion, YouTube, Telegram, Email, local files
12 MCP tools — search, capture, entity exploration, task management, stats
Batched pipeline — optimized from N LLM calls per entity down to 2-3 calls per chunk
5-layer dedup — content hash, semantic similarity, entity merge, relation dedup, observation dedup
GTD task system — embedded in the knowledge graph for cross-referencing
Daily automated sync — GitHub Actions cron for Notion and YouTube

The name combines Second Brain, Total Recall, and Pensieve (Harry Potter) — one ring to rule them all.

Extensions & Ideas

Ways to extend this that we haven't built yet:

Extension	Description
Browser extension	Capture highlights, bookmarks, and full pages as you browse
Voice capture	Whisper transcription from voice memos (phone app or Telegram voice messages)
Calendar integration	Auto-ingest meeting notes from Google Calendar / Outlook
RSS/newsletter	Ingest articles from RSS feeds or email newsletters
Twitter/X bookmarks	Sync saved tweets and threads
Readwise	Import highlights from Kindle, articles, podcasts
Graph visualization	D3.js or Obsidian-style graph view of entities and relations
Spaced repetition	Surface forgotten knowledge on a schedule
Conflict detection	Flag contradictory observations across sources
Multi-user	Shared knowledge graphs with access control
Self-hosted LLM	Run extraction with Ollama/llama.cpp instead of OpenRouter
Webhooks out	Trigger actions when new entities/observations match patterns

YouTube IP Blocking

YouTube's transcript API blocks requests from cloud provider IPs. This affects any CI/CD runner (GitHub Actions, GitLab CI, CircleCI, etc.) because they all use cloud infrastructure.

Symptoms:

RequestBlocked or IpBlocked exception from youtube-transcript-api
Error: "YouTube is blocking requests from your IP"
All transcript fetches fail, 0 videos ingested

Solutions (pick one):

Approach	Pros	Cons
Local Task Scheduler (recommended)	Simple, free, uses home IP	PC must be on/sleeping (not off)
Self-hosted GitHub Actions runner	Same workflow file, logs in GitHub UI	Must keep agent running
Residential proxy	Works from any CI/CD	Costs money, adds complexity
Cookie authentication	Quick fix from cloud	Risks account ban, cookies expire

This project uses the Local Task Scheduler approach via scripts/sync-youtube.ps1.

Known Issues

Issues identified during code review (2026-03-04). Fixes in progress.

Security

#	Severity	Component	Issue
1	Critical	`mcp-server`	Global `McpServer` instance reconnected per request — may leak state under concurrent sessions. Fix: create server per request via factory function.
2	Critical	`slack-capture`	No Slack signing secret verification — any POST to the endpoint is accepted. Fix: add HMAC-SHA256 signature check with `SLACK_SIGNING_SECRET`.
3	Critical	`ingest`	No authentication — the endpoint is callable by anyone if the URL is known. Fix: validate service role key in Authorization header.
4	Important	`ingest`, `mcp-server`	Embedding API errors (rate limit, bad key) crash with unguarded `.data` access. Fix: check `res.ok` and `data.data` before use.

Code & Config

#	Severity	Component	Issue
5	~~Important~~	`mcp-server`	~~`get_entity` silently returns `null` on RPC error instead of an error message.~~ Fixed. Root cause: SQL bug in `get_entity_context` RPC (ORDER BY outside jsonb_agg) + swallowed error in TypeScript.
6	Important	`config.toml`	References `seed.sql` that doesn't exist — `supabase db reset` will fail locally.
7	~~Important~~	`requirements.txt`	~~Missing `ijson` dependency — ChatGPT connector fails on fresh install.~~ Fixed.
8	~~Important~~	`daily-sync.yml`	~~`NOTION_DATABASE_ID` injected unquoted into shell command.~~ Fixed.

Documentation

#	Severity	Component	Issue
9	~~Important~~	`setup-channels.md`	~~ChatGPT/Claude connector examples use `--file` flag — actual flag is `--in`.~~ Fixed.
10	~~Important~~	`setup-channels.md`	~~`local_sync` documented as continuous watcher with `--interval` flag — it's actually a one-shot scanner.~~ Fixed.
11	Important	`setup-supabase.md`	Verification curl uses old hand-rolled JSON-RPC format — stale after SDK rewrite.

V2.0 Roadmap

What's planned for the next major version:

Multimodal ingestion — images (OCR + vision LLM descriptions), audio (Whisper transcription), screenshots, diagrams
Agentic workflows — the knowledge graph reasons over itself: auto-link related observations, suggest connections, generate weekly digests
Temporal awareness — "What did I know about X last month?" vs "What do I know now?" — versioned observations with time-travel queries
Confidence scoring — track observation reliability: primary source vs hearsay vs LLM-generated, with confidence decay over time
Graph RAG — multi-hop retrieval: "What do my colleagues think about the tools I'm considering for the project?" traverses person→opinion→tool→project
Mobile app — native iOS/Android for quick capture with photo, voice, and location context
Federated sync — merge knowledge graphs across devices/instances without a central server
Plugin system — drop-in connector SDK so anyone can build new input channels

Release Notes

v0.3.4 (2026-03-24)

Telegram intent detection fix + search optimization

Fixed intent classifier misrouting personal questions (calendar events, travel plans, meetings) to ambiguous fallback instead of search_knowledge. Questions like "When is my flight?" or "Where am I staying?" now correctly trigger a knowledge graph search.
Updated classifier prompt rules: any question (who/what/when/where/why/how) now defaults to search_knowledge — the knowledge graph contains personal notes, calendar events, travel plans, and conversations, so personal questions should always search.
Flipped ambiguity bias: "when in doubt between search_knowledge and ambiguous, prefer search_knowledge" (was: prefer ambiguous). Only greetings, single words, emojis, and prompt injection attempts trigger the fallback.
Optimized searchBrain() with batch entity/source fetching — 2 queries instead of N+1 per search result.
Added 7 new intent detection test cases covering personal/calendar/event questions.

v0.3.3 (2026-03-19)

YouTube backfill improvements

Added --newest-first flag to YouTube connector — indexes most recent playlist additions first instead of oldest-first, so new content is available sooner during backfill.
Bumped Task Scheduler daily limit from 10 to 15 videos/day.

v0.3.2 (2026-03-18)

Structured output schema enforcement + DB-layer type safety

Replaced json_object response format with json_schema structured output. Entity and observation types are now enum-constrained at the token generation level — the LLM physically cannot produce an invalid type.
Added defense-in-depth _safe_entity_type() in db.py with an extended alias map (30+ biomedical/science types like organ, bacteria, journal → valid types). Acts as a second safety net if schema enforcement is unavailable on the model.
Improved upsert_entity duplicate key fallback with ilike + eq cascade for more robust entity resolution.
Fixed Windows cp1252 encoding crash in retry_failed.py when source titles contain Greek characters or other non-ASCII.
Result: all 450+ previously failed source extractions resolved (0 failures remaining).

v0.3.1 (2026-03-18)

Entity type expansion + extraction resilience

Added 3 new entity types: technology, event, decision (total: 9). Common LLM-generated types outside this set are auto-mapped to the nearest valid type (e.g., platform -> tool, place -> concept).
Fixed upsert_entity fallback lookup: the unique constraint is on lower(name) only, but the fallback was matching on name+type, causing crashes when the same entity was extracted with different types across chunks.
Added retry_failed.py — standalone script to re-run extraction on sources with status='failed' without re-inserting or re-embedding. Sends Telegram notification on completion.
Migration 008: expands the entity type check constraint and reclassifies existing entities.

v0.3.0 (2026-03-18)

Notion backfill sync fix

The --sync flag previously only queried Notion for pages modified after last_edited_time, which meant un-ingested pages in the backlog were never picked up. Now fetches all pages and locally filters into two groups: (a) pages modified since last sync (re-ingest) and (b) pages never ingested (backfill). Prioritizes modified pages, then fills the backlog up to --limit.

YouTube sync moved to local execution

YouTube blocks transcript requests from cloud provider IPs (all GitHub Actions runners). Moved YouTube sync to a local Windows Task Scheduler script (scripts/sync-youtube.ps1) that uses your home IP.
Added --cookies flag to youtube.py for optional cookie-based authentication.
Fixed UnicodeEncodeError crash on Windows when video titles contain emoji/unicode characters.

Other fixes

N+1 query fix + HNSW search optimization (20x speedup)
MCP server concurrency fix (per-request server instances)
Defensive error handling on all edge function API calls
Tiered PDF extraction (unpdf v1.4 + OpenAI vision fallback)
Telegram notification for daily sync results

v0.2.0 (2026-03-05)

Initial public release with core knowledge graph, 7 input channels, 12 MCP tools, and daily sync via GitHub Actions.

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured