MCP Servers

second-brain-worker

Provides tools to retrieve, ingest, and reindex a serverless personal knowledge base built from Obsidian wiki and external files, with hybrid semantic + keyword search via the Model Context Protocol.

README

Second Brain Worker

Un knowledge base personale serverless su Cloudflare Workers che indicizza la tua wiki Obsidian e file esterni, con ricerca ibrida (semantica + keyword) accessibile via MCP (Model Context Protocol) direttamente dal tuo IDE.

Cos'è

Second Brain Worker è un Cloudflare Worker che:

Indicizza file markdown dalla wiki Obsidian e file esterni (PDF, articoli, testo)
Chunka il contenuto rispettando la struttura delle sezioni (##/###), preservando frontmatter e wikilink
Genera embedding semantici via Workers AI (@cf/baai/bge-base-en-v1.5, 768 dim)
Memorizza tutto in R2 (raw), D1 (metadata + FTS5 keyword search), e Vectorize (embedding)
Espone 5 tool MCP — retrieve, ingest, reindex, read, grep — accessibili dal tuo IDE via OAuth GitHub
Sincronizza automaticamente la wiki via GitHub webhook: quando fichi push su main, i file .md cambiati vengono re-indicizzati

Tutto entro il free tier di Cloudflare.

Architettura

                    ┌─────────────────────────────────────────────┐
                    │              Cloudflare Worker               │
                    │                                              │
   GitHub Webhook ──▶  /webhook/github  ──▶  fetch .md  ──▶  R2   │
                    │                                  │           │
   REST API ────────▶  /api/ingest      ──▶  chunker    │           │
                    │  /api/retrieve    ──▶  embedder   │           │
                    │  /api/reindex     ──▶  D1 + Vectorize        │
                    │  /api/health      │           │               │
                    │                   │           │               │
   MCP Client ─────▶  /mcp (OAuth)      │           │               │
                    │  /authorize       │           │               │
                    │  /callback        │           │               │
                    └─────────────────────────────────────────────┘
                           │         │           │
                    ┌──────▼──┐ ┌───▼────┐ ┌────▼──────┐
                    │   R2    │ │   D1   │ │ Vectorize │
                    │ (raw)   │ │(meta + │ │ (embedding│
                    │         │ │  FTS5) │ │  vectors) │
                    └─────────┘ └────────┘ └───────────┘

Componenti

R2 (RAW_BUCKET) — storage del contenuto raw dei file
D1 (DB) — database SQLite con tabelle files, chunks, e tabella virtuale FTS5 chunks_fts per keyword search
Vectorize (VECTORIZE) — indice di embedding semantici (768 dimensioni)
Workers AI (AI) — modello bge-base-en-v1.5 per generazione embedding
KV (OAUTH_KV) — storage di stato/CSRF token per il flow OAuth
Durable Object (SECOND_BRAIN_MCP) — McpAgent che mantiene la sessione MCP
OAuth Provider — integrazione @cloudflare/workers-oauth-provider con GitHub OAuth

MCP Tools

Il server MCP espone 5 tool accessibili dal tuo IDE:

`retrieve`

Ricerca ibrida nel knowledge base: query semantica via Vectorize + query keyword via D1 FTS5. I risultati vengono fusi (merge) con pesi 50/50, ordinati per score combinato, e arricchiti con metadata dal D1.

Parametri:

query (string, required) — query in linguaggio naturale o keyword
limit (number, default 10, max 50) — numero massimo di risultati
file_type ("wiki_page" | "ingested", optional) — filtra per tipo di file
section_prefix (string, optional) — filtra per prefisso sezione (es. ## Architecture)
file_key_prefix (string, optional) — filtra per prefisso file key (es. wiki/concepts/)

`ingest`

Carica un file nel knowledge base: il contenuto viene chunkato, embeddato, e salvato in R2 + D1 + Vectorize. Se il file esiste già, i vecchi chunk e vector vengono sostituiti.

Parametri:

file_key (string, required) — identificatore univoco. Per wiki page: path relativo (es. wiki/concepts/Tool Attention.md). Per file esterni: filename:uuid
content (string, required) — contenuto testuale completo
file_type ("wiki_page" | "ingested", required) — tipo di file
title (string, optional) — titolo del file
source (string, optional) — URL o path sorgente

`reindex`

Re-indicizza un file specifico o tutti i file. Legge il raw da R2, re-esegue chunking + embedding, aggiorna Vectorize e D1.

Parametri:

file_key (string, optional) — file da re-indicizzare. Se omesso, re-indicizza tutti i file

`read`

Legge il testo raw di un file indicizzato da R2 con offset/limite opzionali. Utile per leggere il contesto completo attorno a un chunk trovato via retrieve.

Parametri:

file_key (string, required) — file key del file da leggere
offset (number, optional, default 0) — offset in caratteri da cui iniziare
max_chars (number, optional, default 2000, max 10000) — numero massimo di caratteri da restituire

`grep`

Cerca un pattern regex nel testo raw di un file indicizzato. Restituisce i match con contesto opzionale. Utile per estrarre dati strutturati (date, importi, ID) dai documenti.

Parametri:

file_key (string, required) — file key del file in cui cercare
pattern (string, required) — pattern regex JavaScript
max_matches (number, optional, default 10, max 50) — numero massimo di match
context (number, optional, default 40, max 200) — caratteri di contesto attorno a ogni match

REST API

Oltre ai tool MCP, il Worker espone endpoint REST (non autenticati, utili per script e setup):

Metodo	Endpoint	Descrizione
GET	`/api/health`	Stato del Worker: file indicizzati, chunk totali
POST	`/api/ingest`	Ingest di un file (stesso formato del tool MCP)
POST	`/api/retrieve`	Retrieve ibrido (stesso formato del tool MCP)
POST	`/api/reindex`	Reindex di un file o di tutti
POST	`/api/read`	Leggi raw text da R2 con offset/limit
POST	`/api/grep`	Ricerca regex sul contenuto indicizzato
GET	`/api/metrics`	Metriche aggregate di retrieve (latency, score, zero-result rate)
POST	`/webhook/github`	Webhook GitHub per sync automatico

Prerequisiti

Node.js 22+
Account Cloudflare (free tier sufficiente)
GitHub OAuth App — per l'autenticazione MCP
Wiki Obsidian in un repo GitHub (se vuoi usare il sync automatico)

Setup completo

1. Clona e installa

git clone https://github.com/50R1Paps/second-brain-worker.git
cd second-brain-worker
npm install

2. Crea le risorse Cloudflare

Esegui questi comandi con wrangler per creare le risorse necessarie:

# Crea R2 bucket
npx wrangler r2 bucket create second-brain-raw

# Crea D1 database
npx wrangler d1 create second-brain
# Annota il database_id dal output

# Crea KV namespace per OAuth
npx wrangler kv namespace create OAUTH_KV
# Annota l'id dal output

# Crea Vectorize index (768 dim per bge-base-en-v1.5)
npx wrangler vectorize create second-brain-embeddings --dimensions 768 --metric cosine

3. Configura `wrangler.toml`

Copia il file di esempio e compila i valori reali:

cp wrangler.toml.example wrangler.toml

Modifica wrangler.toml sostituendo <your-d1-database-id> e <your-kv-namespace-id> con i valori ottenuti al passo 2.

4. Applica la migration D1

# Locale (per dev)
npm run db:migrate

# Remoto (per produzione)
npm run db:migrate:remote

5. Crea una GitHub OAuth App

Vai su GitHub Settings > Developer settings > OAuth Apps > New OAuth App
Compila:
- Application name: Second Brain MCP
- Homepage URL: https://second-brain.<tuo-subdomain>.workers.dev
- Authorization callback URL: https://second-brain.<tuo-subdomain>.workers.dev/callback
Annota il Client ID e genera un Client Secret

6. Imposta i segreti

# OAuth
npx wrangler secret put GITHUB_CLIENT_ID
npx wrangler secret put GITHUB_CLIENT_SECRET
npx wrangler secret put COOKIE_ENCRYPTION_KEY

# GitHub webhook sync (opzionale, solo se usi il sync automatico)
npx wrangler secret put WEBHOOK_SECRET
npx wrangler secret put GITHUB_TOKEN
npx wrangler secret put GITHUB_TOKEN_EXPIRY

Per COOKIE_ENCRYPTION_KEY puoi generare una stringa casuale con:

openssl rand -hex 32

Per GITHUB_TOKEN, crea un Personal Access Token con scope repo (per leggere i file .md via API).

Per WEBHOOK_SECRET, genera un'altra stringa casuale e usala anche come secret del webhook GitHub.

Per GITHUB_TOKEN_EXPIRY, inserisci la data di scadenza del PAT in formato ISO (es. 2026-09-25T00:00:00Z). Il Cron Trigger giornaliero controlla questa data e apre automaticamente una issue su second-brain-worker 2 giorni prima della scadenza come promemoria.

7. Deploy

npm run deploy

Annota l'URL del Worker (es. https://second-brain.<tuo-subdomain>.workers.dev).

8. Inizializza il knowledge base (setup script)

Se hai la wiki Obsidian in locale, puoi ingerire tutti i file .md con lo script di setup:

# Locale (durante dev)
npm run setup -- --wiki-dir /path/to/wiki --url http://localhost:8787

# Remoto (dopo deploy)
npm run setup -- --wiki-dir /path/to/wiki --url https://second-brain.<tuo-subdomain>.workers.dev

# Dry run (lista file senza ingerire)
npm run setup:dry -- --wiki-dir /path/to/wiki

9. Configura il webhook GitHub (opzionale)

Per sincronizzare automaticamente la wiki quando fichi push su main:

Vai su GitHub > Your Repo > Settings > Webhooks > Add webhook
Compila:
- Payload URL: https://second-brain.<tuo-subdomain>.workers.dev/webhook/github
- Content type: application/json
- Secret: lo stesso valore di WEBHOOK_SECRET
- Trigger: "Just the push event"
Salva

Da ora, ogni push su main che modifica file .md in wiki/ triggera la re-indicizzazione automatica.

Gestione dei secret

I secret sono gestiti via wrangler secret put e stored nella dashboard Cloudflare (mai nel codice). Ecco quando aggiornarli:

Secret	Scade?	Quando aggiornare
`GITHUB_TOKEN`	Sì — scade in base alla configurazione del PAT	Quando il Personal Access Token scade: `npx wrangler secret put GITHUB_TOKEN` e incolla il nuovo token
`GITHUB_TOKEN_EXPIRY`	Sì — da aggiornare insieme al token	Quando aggiorni `GITHUB_TOKEN`: `npx wrangler secret put GITHUB_TOKEN_EXPIRY` con la nuova data di scadenza
`GITHUB_CLIENT_ID`	No	Solo se revochi/ricrei l'OAuth App su GitHub
`GITHUB_CLIENT_SECRET`	No	Solo se revochi/ricrei l'OAuth App su GitHub
`COOKIE_ENCRYPTION_KEY`	No	Mai (a meno che tu non voglia invalidare tutte le sessioni attive)
`WEBHOOK_SECRET`	No	Mai (deve coincidere con il secret configurato nelle impostazioni webhook su GitHub)

Rotazione del `GITHUB_TOKEN`

Il GITHUB_TOKEN (Personal Access Token con scope repo) è l'unico secret con scadenza. Un Cron Trigger giornaliero controlla GITHUB_TOKEN_EXPIRY e apre automaticamente una issue su second-brain-worker 2 giorni prima della scadenza come promemoria.

Quando ricevi la notifica (o quando il token è già scaduto):

Crea un nuovo token su GitHub Settings > Tokens con scope repo
Aggiorna i secret su Cloudflare:

npx wrangler secret put GITHUB_TOKEN
npx wrangler secret put GITHUB_TOKEN_EXPIRY

Incolla il nuovo token e la nuova data di scadenza (formato ISO, es. 2026-09-25T00:00:00Z)
Chiudi la issue di promemoria su GitHub

Nessun altro secret o configurazione su Cloudflare deve essere aggiornato.

Configurare il client MCP nel IDE

Windsurf / Cursor

Crea o modifica il file mcp_config.json nel tuo IDE (in Windsurf: Settings > MCP Servers):

{
  "mcpServers": {
    "second-brain": {
      "command": "npx",
      "args": [
        "workers-mcp",
        "proxy",
        "https://second-brain.<tuo-subdomain>.workers.dev/mcp"
      ]
    }
  }
}

Sostituisci <tuo-subdomain> con il tuo subdomain reale.

Al primo utilizzo, il IDE aprira il browser per l'autenticazione GitHub. Dopo il login, i 5 tool (retrieve, ingest, reindex, read, grep) saranno disponibili nell'AI assistant.

Claude Desktop

Aggiungi al file claude_desktop_config.json:

{
  "mcpServers": {
    "second-brain": {
      "command": "npx",
      "args": [
        "workers-mcp",
        "proxy",
        "https://second-brain.<tuo-subdomain>.workers.dev/mcp"
      ]
    }
  }
}

Sviluppo

Comandi disponibili

Comando	Descrizione
`npm run dev`	Avvia il Worker in locale con `wrangler dev`
`npm run deploy`	Deploy su Cloudflare
`npm test`	Esegue i test (vitest)
`npm run test:watch`	Test in watch mode
`npm run typecheck`	Type checking con `tsc --noEmit`
`npm run db:migrate`	Applica migration D1 in locale
`npm run db:migrate:remote`	Applica migration D1 in remoto
`npm run setup`	Script di ingest della wiki
`npm run setup:dry`	Dry run dello script di setup

Struttura del progetto

src/
├── worker.ts          # Entry point: OAuthProvider + routing
├── mcp.ts             # SecondBrainMCP (McpAgent) con i 5 tool
├── types.ts           # Tipi condivisi: Env, request/response interfaces
├── http.ts            # Utility HTTP: jsonResponse, handleCORS
├── health.ts          # Health check endpoint
├── ingest.ts          # Ingestion: chunking, embedding, R2+D1+Vectorize, GitHub push
├── retrieve.ts        # Retrieve ibrido: semantic + keyword search, merge, metrics
├── metrics.ts         # Metriche aggregate di retrieve (latency, score, zero-result)
├── reindex.ts         # Reindex singolo file o tutti i file
├── read.ts            # Read raw text da R2 con offset/limit
├── grep.ts            # Grep regex sul contenuto indicizzato
├── chunker.ts         # Markdown chunker (split su ##/###, overlap, wikilink-safe)
├── github-handler.ts  # Hono app: REST API + OAuth flow (/authorize, /callback)
├── oauth-utils.ts     # Utility OAuth: state, CSRF, cookie, approval dialog
├── webhook.ts         # GitHub webhook handler per sync automatico
├── cron.ts            # Cron Trigger: reminder scadenza GITHUB_TOKEN via GitHub Issue
└── setup.ts           # Logica del setup script (ingest bulk)

migrations/
├── 0001_initial_schema.sql       # Schema D1: files, chunks, chunks_fts + trigger
└── 0002_retrieve_metrics.sql     # Tabella retrieve_metrics per observability

scripts/
└── setup.ts           # CLI entry point per il setup script

test/                  # Test suite (vitest + @cloudflare/vitest-pool-workers)

Test

npm test           # tutti i test
npm run typecheck  # type checking

I test usano @cloudflare/vitest-pool-workers per simulare l'ambiente Workers con D1, R2, e AI bindings.

Operazioni di manutenzione

Deploy del Worker

Dopo aver modificato il codice:

npm run typecheck   # verifica tipi
npm test            # verifica test
npx wrangler deploy # deploy su Cloudflare

Non serve reindicizzare dopo un deploy che non cambia la logica di ingest/retrieve.

Applicare una nuova migration D1

Quando aggiungi una migration (es. 0002_retrieve_metrics.sql):

# Locale (per dev)
npm run db:migrate

# Remoto (per produzione)
npx wrangler d1 migrations apply second-brain --remote

La migration va applicata prima del deploy se il codice dipende dalla nuova tabella.

Reindicizzare i documenti (popolare Vectorize)

Se i vettori sono mancanti o vuoi rigenerare gli embeddings:

# Reindex di un singolo file
curl -X POST https://second-brain.<tuo-subdomain>.workers.dev/api/reindex \
  -H "Content-Type: application/json" \
  -d '{"file_key": "wiki/concepts/Esempio.md"}'

# Reindex di tutti i file (attenzione: può hit rate limit su Workers AI)
curl -X POST https://second-brain.<tuo-subdomain>.workers.dev/api/reindex \
  -H "Content-Type: application/json" \
  -d '{}'

Nota: il reindex di tutti i file (113+) può fallire per rate limiting di Workers AI. Se semantic_hits è 0 dopo un reindex completo, reindicizza i file singolarmente con una pausa di ~1 secondo tra ogni chiamata. Verifica lo stato con:

npx wrangler d1 execute second-brain --remote \
  --command "SELECT COUNT(*) as total, COUNT(vector_id) as with_vectors FROM chunks"

Verificare le metriche di retrieve

Dopo aver fatto qualche retrieve, controlla le metriche aggregate:

# Ultime 24 ore (default)
curl https://second-brain.<tuo-subdomain>.workers.dev/api/metrics

# Ultima ora
curl https://second-brain.<tuo-subdomain>.workers.dev/api/metrics?period=1h

# Ultimi 7 giorni
curl https://second-brain.<tuo-subdomain>.workers.dev/api/metrics?period=7d

# Ultimi 30 giorni
curl https://second-brain.<tuo-subdomain>.workers.dev/api/metrics?period=30d

Periodi validi: 1h, 24h, 7d, 30d.

Le metriche includono: total queries, zero-result rate, avg/p50/p95 latency, score distribution, e search type breakdown (semantic/keyword/hybrid).

Documenti

Limiti e note

Il chunker e ottimizzato per markdown con frontmatter YAML e wikilink [[Nome]] (formato Obsidian)
L'embedding model bge-base-en-v1.5 produce vettori a 768 dimensioni — assicurati che il Vectorize index sia creato con --dimensions 768
Il retrieve ibrido usa pesi 50% semantico + 50% keyword, con top-K = 20 per ciascuna modalita
Il content dei risultati viene troncato a 2000 caratteri per chunk
L'OAuth flow supporta solo GitHub come identity provider
Il webhook sync processa solo file .md nella cartella wiki/ su push a main
I commit automatici del Worker su GitHub includono [skip ci] per non triggerare la pipeline di lint
Le metriche di retrieve sono best-effort: se la persistenza su D1 fallisce, il retrieve non viene interrotto

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

second-brain-worker

README

Second Brain Worker

Cos'è

Architettura

Componenti

MCP Tools

retrieve

ingest

reindex

read

grep

REST API

Prerequisiti

Setup completo

1. Clona e installa

2. Crea le risorse Cloudflare

3. Configura wrangler.toml

4. Applica la migration D1

5. Crea una GitHub OAuth App

6. Imposta i segreti

7. Deploy

8. Inizializza il knowledge base (setup script)

9. Configura il webhook GitHub (opzionale)

Gestione dei secret

Rotazione del GITHUB_TOKEN

Configurare il client MCP nel IDE

Windsurf / Cursor

Claude Desktop

Sviluppo

Comandi disponibili

Struttura del progetto

Test

Operazioni di manutenzione

Deploy del Worker

Applicare una nuova migration D1

Reindicizzare i documenti (popolare Vectorize)

Verificare le metriche di retrieve

Documenti

Limiti e note

Recommended Servers

`retrieve`

`ingest`

`reindex`

`read`

`grep`

3. Configura `wrangler.toml`

Rotazione del `GITHUB_TOKEN`