ClickHouse MCP Agent
Enables querying ClickHouse databases using natural language with AI models, supporting multiple providers and access restrictions via per-call allow-lists.
README
ClickHouse MCP Agent
AI agent for ClickHouse database analysis via MCP (Model Context Protocol).
A single MCP server (mcp-clickhouse) driven by a single agent instance. Access restriction is performed via explicit allow-lists you pass per call (allowed_tables, allowed_databases), rather than managing multiple keys or fan-out across multiple agents.
Features
- Query ClickHouse databases using natural language with AI models
- Structured output:
analysis,confidence,sql_used - Easy connection management (predefined or custom)
- Conversational context with message-history pruning/summarization
- No CLI or external .env required — configure at runtime
- Access restriction via per-call allow-lists (
allowed_tables,allowed_databases) - Streamable results via
run_stream() - Persistent MCP server mode via
async with ClickHouseAgent() - Typed exception hierarchy for reliable error handling
Supported Providers
- OpenAI
- Anthropic
- Google Gemini
- Groq
- Mistral
- Cohere
Local Development (Docker)
The fastest way to get started — no cloud ClickHouse account needed:
# Start ClickHouse with seeded demo data (orders + products)
docker compose up -d
# Install the package with dev dependencies
pip install -e ".[dev]"
# Run the examples (set your API key first)
GOOGLE_API_KEY=... python examples/example_minimal.py
GOOGLE_API_KEY=... python examples/example_stream.py
The docker/init.sql file seeds demo.orders (25 orders, May 2026) and demo.products (10 products across Electronics, Sports, Home, Books) automatically on first start.
Quickstart
import asyncio
from agent.clickhouse_agent import ClickHouseAgent
from agent.config import config
config.set_ai_model("openai:gpt-4o-mini")
config.set_model_api_key("openai", "your_api_key_here")
config.set_clickhouse(host="localhost", port="8123", user="default", password="", secure="false")
async def main():
agent = ClickHouseAgent()
result = await agent.run(
allowed_tables=["orders", "products"],
allowed_databases=["demo"],
query="give me some insights on the recent data",
)
print("Analysis:", result.analysis)
print("Confidence:", result.confidence)
print("SQL used:", result.sql_used)
asyncio.run(main())
Persistent server (multiple queries)
Use the context manager to keep the MCP subprocess alive across calls — avoids subprocess startup overhead on every query:
async def main():
async with ClickHouseAgent() as agent:
r1 = await agent.run(query="how many orders were placed last week?")
r2 = await agent.run(query="which products are selling fastest?", message_history=r1.messages)
Switching providers
All providers use the same interface — just swap the model string and key:
# Anthropic
config.set_ai_model("anthropic:claude-haiku-4-5-20251001")
config.set_model_api_key("anthropic", "your_key")
# Google
config.set_ai_model("google-gla:gemini-2.0-flash")
config.set_model_api_key("google", "your_key")
# Groq
config.set_ai_model("groq:llama-3.3-70b-versatile")
config.set_model_api_key("groq", "your_key")
Message History & Summarization
Pass message_history between calls for multi-turn conversations. When token usage exceeds summarize_config.token_limit, older messages are automatically summarized into a compact form by a separate summarizer agent.
summarize_config.set_token_limit(10000)
summarize_config.set_ai_model("gemini-2.5-flash")
Output
Each call to ClickHouseAgent.run() returns a RunResult:
| Field | Description |
|---|---|
analysis |
Natural-language result text from the model |
confidence |
Confidence level (1–10) |
sql_used |
List of SQL strings executed during the run |
messages |
Full (possibly pruned/summarized) message history |
new_messages |
Only messages created in the latest turn |
last_message |
The last message in the conversation |
usage |
Token/usage statistics for the run |
Error Handling
All errors raise from a typed hierarchy so you can catch at the right level:
from agent.exceptions import ClickHouseMCPError, MCPConnectionError, AgentExecutionError
try:
result = await agent.run(query="...")
except MCPConnectionError:
# MCP subprocess failed to start or connection dropped
...
except AgentExecutionError:
# Agent logic failed during the run
...
except ClickHouseMCPError:
# Any library error
...
Requirements
- Python 3.10+
- AI API key for your provider (OpenAI, Anthropic, Google/Gemini, Groq, Mistral, Cohere)
All dependencies are managed via pyproject.toml.
Roadmap
✅ Done
- MCP integration via
pydantic_ai.mcp.MCPServerStdio - SQL generation/execution via MCP tools
- Schema inspection (databases/tables/columns)
- Config-driven connections (playground/local/custom)
- Access restriction via per-call allow-lists (
allowed_tables,allowed_databases) - Runtime provider/model selection and API key management
- Structured outputs (
ClickHouseOutput) andRunResultwithsql_used - Message history pruning/summarization
- Streaming results via
run_stream() - Persistent MCP server via
async with ClickHouseAgent() - Typed exception hierarchy
- Local development via Docker (
docker compose up -d) rufflinting, Python 3.13 support, CI hardened
⚙️ 0.11 — Agent API expansion
- Async batch queries (parallel queries in one call)
ClickHouseAgent.reset()for lifecycle control without re-instantiationstructlogoptional dep for structured observabilitypydantic-aiAPI audit and model ref updates (Claude 4 / GPT family)
🔒 0.12 — Stable
- API locked — no breaking changes without a major version
- All known bugs resolved
py.typedcheck added to CI
🔭 Post-1.0 — Future
- Database-agnostic abstraction (Elasticsearch, MongoDB, Postgres)
- FastAPI standalone deployment option
Contributing
Open an issue or pull request for features or fixes.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.