corpuskit

corpuskit

Enables AI coding agents to search documentation and query constraints via MCP tools kr_search and kr_constraints.

Category
Visit Server

README

corpuskit

Config-driven knowledge corpus runtime + compression orchestration for AI coding agents — multi-agent via MCP, zero-config on any docs folder.

It turns a folder of markdown into a searchable, constraint-aware institutional memory your agent can query (instead of re-deriving from scratch), and orchestrates external token-compression tools. Generalized from a working Claude-Code system; a project is just one corpus.yaml.

What you get

  • Manifest + FTS5 search over full document bodies (SQLite, stdlib — no embeddings/torch needed). Query with BM25.
  • Machine-readable constraints@constraint TYPE | target= | rule= | adr= | severity= lines in your decision/ADR docs, queryable by component.
  • MCP server exposing kr_search / kr_constraints (names/descriptions configurable) — works with any MCP-capable agent (Claude Code, Cursor, …).
  • Agent install — idempotent registration of the MCP server + write-back (SessionEnd) + constraint-injection (UserPromptSubmit) hooks.
  • Compression orchestration — discover/install/lifecycle/health over external RTK (shell-output) + Headroom (transport proxy), used AS-IS (never vendored).

Install

pip install corpuskit            # core (CLI + manifest/index/constraints)
pip install "corpuskit[mcp]"     # + MCP server
pip install "corpuskit[all]"
# until published: pip install git+https://github.com/SupaKang/corpuskit

Quickstart (zero-config)

cd my-docs/                 # any folder of *.md
corpus index build         # project_key = top-level dirname; full-body FTS5 index
corpus index query "rate limiter design"
corpus constraints --component payments-api

No corpus.yaml needed — defaults to auto_layout + dirname keys + standard YAML/bulleted frontmatter.

Config (corpus.yaml) — opt in when you need it

knowledge:
  keyed_roots: { specs: spec, decisions: decision }   # relpath -> doc_type (key = subdir)
  flat_roots:  { daily: daily }
  frontmatter: { style: auto, fields: { project: [project], status: [status] } }
  constraints: { decisions_root: decisions }
agent: { type: claude-code }       # claude-code (MCP+hooks) | cursor (MCP) | standalone | cline/mcp-json
compression: { enabled: false }

corpus init scaffolds one. See examples/overmind.yaml for a full localized (Korean) instance.

Agent integration

corpus install --agent claude-code     # idempotent: MCP + SessionEnd + UserPromptSubmit (backs up settings.json)
corpus status --agent claude-code
corpus uninstall --agent claude-code

Restart your agent; it gains kr_search / kr_constraints tools, auto-injected active constraints, and self-updating index on session end.

Compression

corpus compression install   # ensure RTK + Headroom present
corpus compression start     # launch Headroom proxy, print ANTHROPIC_BASE_URL
corpus compression health    # versions, native-Windows degradation, port

CLI

corpus init | index build|query | constraints | serve-mcp | install|uninstall|status | compression … | doctor

License

Apache-2.0 (this code). External tools RTK and Headroom are separate Apache-2.0 projects, used as-is — see THIRD_PARTY.md.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured