MCP Servers

vectorise-mcp

Local MCP server that indexes folders of documents into a hybrid vector + keyword search index for Claude Desktop, with support for PDFs, Office files, and images via OCR.

README

vectorise-mcp

Local MCP server that turns folders of documents into a hybrid vector + keyword index that Claude Desktop can search. Stays offline after first model download.

Stack

MCP: mcp (FastMCP), stdio transport
Embeddings: BAAI/bge-small-en-v1.5 (384-dim)
Reranker: BAAI/bge-reranker-base cross-encoder
Vector DB: sqlite-vec
Keyword DB: SQLite FTS5 (BM25)
Fusion: Reciprocal Rank Fusion → cross-encoder rerank

Install

pip install vectorise-mcp                 # core
pip install "vectorise-mcp[ocr]"          # + OCR for scanned PDFs / images
pip install "vectorise-mcp[notify]"       # + desktop toast on job completion
pip install "vectorise-mcp[ocr,notify]"   # everything

vectorise-mcp setup                       # pre-download models (~250MB)

Python ≥ 3.10.

Wire into Claude Desktop

claude_desktop_config.json:

{
  "mcpServers": {
    "vectorise": {
      "command": "vectorise-mcp",
      "args": ["serve"]
    }
  }
}

Config file location:

Windows: %APPDATA%\Claude\claude_desktop_config.json
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Linux: ~/.config/Claude/claude_desktop_config.json

Restart Claude Desktop.

File support

Format	Notes
`.pdf`	text + OCR fallback for scanned pages
`.docx`, `.pptx`, `.xlsx`, `.xlsm`, `.xls`	full content + tables
`.txt`, `.md`, `.markdown`	UTF-8
`.png`, `.jpg`, `.jpeg`, `.tiff`, `.bmp`, `.webp`	OCR (requires `[ocr]`)
`.doc`, `.ppt`	detected, skipped, reported

Tools exposed to Claude

Tool	What it does
`vectorise_list_projects`	list all indexed projects
`vectorise_index_project(folder, project, mode)`	start indexing job, returns `job_id` instantly
`vectorise_reindex_project(project)`	SHA1-incremental rescan of all sources
`vectorise_index_status(job_id)`	instant job snapshot incl. progress + ETA
`vectorise_await_index(job_id, timeout_sec)`	optional blocking wait
`vectorise_list_jobs(active_only)`	jobs from current server session
`vectorise_search(project, query, k, candidate_pool, file_glob, subdirectory, page_min, page_max, min_similarity)`	hybrid + reranked search
`vectorise_delete_project(project)`	delete project's `.db`

mode for vectorise_index_project: auto (default — incremental if path already indexed, error on conflict) / replace / append / fail.

Architecture

Indexing job runs in a daemon thread with its own asyncio loop. The MCP server's main loop stays free to serve index_status / search calls regardless of how heavy the embedding/OCR work is. Status calls are instant; search works on the partial index while a job is running.

folder
  ↓  parsers.parse                        (.pdf .docx .pptx .xlsx ...)
chunks (sentence-aware, 384 tok / 96 overlap, single-sentence hard-split)
  ↓  embedder.embed_passages              (BGE-small)
sqlite-vec   +   FTS5 (BM25)              ← per-file SHA1 dedup, basename collision auto-rename
  ↓  search                               (vector top-N + BM25 top-N)
RRF fusion → cross-encoder rerank → top-K

Project DBs live in ~/.vectorise-mcp/<name>.db. Self-contained — source folder can be deleted after indexing.

Config (env vars)

Var	Default	Purpose
`VECTORISE_MCP_EMBED_MODEL`	`BAAI/bge-small-en-v1.5`	must be 384-dim
`VECTORISE_MCP_RERANKER_MODEL`	`BAAI/bge-reranker-base`
`VECTORISE_MCP_EMBED_BATCH`	`32`
`VECTORISE_MCP_RERANKER_BATCH`	`16`
`VECTORISE_MCP_OCR_MIN_CONFIDENCE`	`0.5`	drop OCR lines below
`VECTORISE_MCP_OCR_WORKERS`	`4`	parallel page OCR threads
`VECTORISE_MCP_OCR_DPI`	`200`	PDF rasterisation DPI
`VECTORISE_MCP_OCR_MAX_DIM`	`4000`	downscale huge images before OCR
`VECTORISE_MCP_NOTIFY`	`1`	desktop toast on/off

Performance

	CPU	GPU
Indexing throughput	~80 chunks/sec	5–10× faster
Search latency (k=5, ≤500K chunks)	~150ms	similar
Disk per chunk	~2 KB
Cold start	~5s (lazy model load)

Local dev

git clone https://github.com/jameslovespancakes/Vectorised-Embedding-MCP
cd Vectorised-Embedding-MCP
pip install -e ".[ocr,notify]"

# tests bypass MCP transport, drive indexer + tools directly
python tests/smoke_test.py
python tests/smoke_test_projects.py
python tests/smoke_test_jobs.py
python tests/smoke_test_filters.py
python tests/smoke_test_office.py
python tests/smoke_test_chunking.py
python tests/smoke_test_legacy_skip.py

License

MIT.

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured