sifter-mcp

sifter-mcp

Structure any document, query it like a database. Sifter turns documents into typed, schema-defined records, queryable in natural language from any MCP client.

Category
Visit Server

README

Sifter

CI codecov PyPI npm Python Node License: MIT

Structure any document. Query it like a database. Build on top via API.

Open-source document intelligence engine — schema-driven extraction, NL query, MCP server, Python and TypeScript SDKs. Self-hostable under MIT.

Sifter demo


Why not RAG?

RAG is built for retrieval — find me chunks similar to this query. It breaks on homogeneous collections like invoices, contracts, or receipts where every document looks alike and the question is an aggregation, not a search.

RAG vs Sifter

Sifter's approach: extract structured fields once (client, date, total), store them as typed records, query with real filters and aggregations. The answer is exact and reproducible — because it's a database query, not a similarity search.


Quickstart

git clone https://github.com/sifter-ai/sifter
cd sifter/code
cp server/.env.example server/.env.local    # set SIFTER_DEFAULT_API_KEY (required)
docker compose up -d

Open http://localhost:3000 — create a sift, upload documents, query results.


Python SDK

pip install sifter-ai
from sifter import Sifter

s = Sifter(api_key="sk-...")

sift = s.create_sift("Invoices", "client name, date, total amount")
sift.upload("./invoices/")
sift.wait()

for record in sift.records():
    print(record["extracted_data"])
# {"client": "Acme Corp", "date": "2024-01-15", "total_amount": 1500.0}

TypeScript SDK

npm install @sifter-ai/sdk
import { Sifter } from "@sifter-ai/sdk";

const client = new Sifter({ apiKey: "sk-..." });

const sift = await client.createSift("Invoices", "client, date, total amount");
await sift.upload("./invoices/");
await sift.wait();

const records = await sift.records();
console.log(records);

MCP server (Claude Desktop / Cursor / AI agents)

{
  "mcpServers": {
    "sifter": {
      "command": "uvx",
      "args": ["sifter-mcp", "--base-url", "http://localhost:8000"],
      "env": { "SIFTER_API_KEY": "sk-dev" }
    }
  }
}

Then ask Claude: "What's the total unpaid across all invoices from last quarter?"

Want a remote MCP URL without running a local server? → Sifter Cloud


What's included

  • Schema-driven extraction — describe what to extract in natural language; schema is inferred automatically and exported as Pydantic / TypeScript types
  • NL query — ask questions in plain language; Sifter generates inspectable MongoDB aggregation pipelines
  • MCP server — stdio transport, read + write tools, zero custom integration code
  • REST API + SDKs — full OpenAPI spec, typed clients for Python and TypeScript
  • Webhooks — HMAC-signed HTTP callbacks on every extraction event
  • Spec-driven dashboards — short NL spec → auto-generated board (KPI, breakdown, table, time series)
  • CLIsifter extract, sifter records, sifter sifts for terminal workflows and CI
  • Self-hostable — Docker Compose, bring your own MongoDB and LLM API key

Don't want to run infrastructure?

Sifter Cloud is the managed version — no Mongo, no ops, remote MCP endpoint, Google Drive and email ingress. Free tier available.


Docs

Full documentation at docs.sifter.run — quickstart, SDK reference, MCP guide, cookbook, self-hosting.


License

MIT — see LICENSE.

Created by Bruno Fortunato.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured