sifter-mcp
Structure any document, query it like a database. Sifter turns documents into typed, schema-defined records, queryable in natural language from any MCP client.
README
Sifter
Structure any document. Query it like a database. Build on top via API.
Open-source document intelligence engine — schema-driven extraction, NL query, MCP server, Python and TypeScript SDKs. Self-hostable under MIT.

Why not RAG?
RAG is built for retrieval — find me chunks similar to this query. It breaks on homogeneous collections like invoices, contracts, or receipts where every document looks alike and the question is an aggregation, not a search.

Sifter's approach: extract structured fields once (client, date, total), store them as typed records, query with real filters and aggregations. The answer is exact and reproducible — because it's a database query, not a similarity search.
Quickstart
git clone https://github.com/sifter-ai/sifter
cd sifter/code
cp server/.env.example server/.env.local # set SIFTER_DEFAULT_API_KEY (required)
docker compose up -d
Open http://localhost:3000 — create a sift, upload documents, query results.
Python SDK
pip install sifter-ai
from sifter import Sifter
s = Sifter(api_key="sk-...")
sift = s.create_sift("Invoices", "client name, date, total amount")
sift.upload("./invoices/")
sift.wait()
for record in sift.records():
print(record["extracted_data"])
# {"client": "Acme Corp", "date": "2024-01-15", "total_amount": 1500.0}
TypeScript SDK
npm install @sifter-ai/sdk
import { Sifter } from "@sifter-ai/sdk";
const client = new Sifter({ apiKey: "sk-..." });
const sift = await client.createSift("Invoices", "client, date, total amount");
await sift.upload("./invoices/");
await sift.wait();
const records = await sift.records();
console.log(records);
MCP server (Claude Desktop / Cursor / AI agents)
{
"mcpServers": {
"sifter": {
"command": "uvx",
"args": ["sifter-mcp", "--base-url", "http://localhost:8000"],
"env": { "SIFTER_API_KEY": "sk-dev" }
}
}
}
Then ask Claude: "What's the total unpaid across all invoices from last quarter?"
Want a remote MCP URL without running a local server? → Sifter Cloud
What's included
- Schema-driven extraction — describe what to extract in natural language; schema is inferred automatically and exported as Pydantic / TypeScript types
- NL query — ask questions in plain language; Sifter generates inspectable MongoDB aggregation pipelines
- MCP server — stdio transport, read + write tools, zero custom integration code
- REST API + SDKs — full OpenAPI spec, typed clients for Python and TypeScript
- Webhooks — HMAC-signed HTTP callbacks on every extraction event
- Spec-driven dashboards — short NL spec → auto-generated board (KPI, breakdown, table, time series)
- CLI —
sifter extract,sifter records,sifter siftsfor terminal workflows and CI - Self-hostable — Docker Compose, bring your own MongoDB and LLM API key
Don't want to run infrastructure?
Sifter Cloud is the managed version — no Mongo, no ops, remote MCP endpoint, Google Drive and email ingress. Free tier available.
Docs
Full documentation at docs.sifter.run — quickstart, SDK reference, MCP guide, cookbook, self-hosting.
License
MIT — see LICENSE.
Created by Bruno Fortunato.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.