duckdb_analytics_mcp
Provides read-only SQL analytics on local CSV, Parquet, and JSON datasets using DuckDB, with pagination, security guardrails, and health monitoring.
README
DuckDB Analytics MCP
Read-only MCP server for querying local datasets (.csv, .parquet, .json, .jsonl) through DuckDB with explicit SQL guardrails, pagination, timeouts, and optional bearer-token auth.
What This Server Provides
- MCP transports:
stdioandstreamable-http - Tool responses in either
markdownorjson - Dataset catalog + schema resources
- Read-only SQL execution over a registered
sourcetable alias - Guardrails for query size, statement shape, and banned SQL keywords
- Optional auth with required scope enforcement
Supported Dataset Formats
Files are discovered recursively under DATASET_DIR.
.csv.parquet.json.jsonl
Setup
git clone https://github.com/pypi-ahmad/duckdb-analytics-mcp.git
cd duckdb-analytics-mcp
uv python pin 3.12.10
uv sync --dev
cp .env.example .env
Configuration
Runtime is configured through environment variables (loaded from .env by default).
| Variable | Default | Description |
|---|---|---|
SERVER_NAME |
duckdb_analytics_mcp |
MCP server name |
LOG_LEVEL |
INFO |
Log level (DEBUG/INFO/WARNING/ERROR/CRITICAL) |
HOST |
127.0.0.1 |
Bind host for HTTP transport |
PORT |
8000 |
Bind port for HTTP transport |
TRANSPORT |
streamable-http |
Default transport (stdio or streamable-http) |
DATASET_DIR |
data |
Dataset root; relative paths resolve from project root |
DEFAULT_LIMIT |
25 |
Default page size |
MAX_LIMIT |
200 |
Upper bound for list/query page size |
MAX_QUERY_CHARS |
4000 |
Max SQL length |
QUERY_TIMEOUT_SECONDS |
20 |
Timeout for describe/query operations |
MAX_SAMPLE_ROWS |
25 |
Max rows returned for dataset samples |
DUCKDB_THREADS |
4 |
DuckDB execution threads |
DUCKDB_MEMORY_LIMIT |
1GB |
DuckDB memory limit |
STATIC_BEARER_TOKEN |
unset | Enables token verification when auth URLs are also set |
AUTH_ISSUER_URL |
unset | Auth issuer URL (required with STATIC_BEARER_TOKEN) |
AUTH_RESOURCE_SERVER_URL |
unset | Resource server URL (required with STATIC_BEARER_TOKEN) |
AUTH_REQUIRED_SCOPE |
analytics.read |
Required access scope |
Auth note:
- If
STATIC_BEARER_TOKENis set, bothAUTH_ISSUER_URLandAUTH_RESOURCE_SERVER_URLmust also be set.
Run
Streamable HTTP
uv run duckdb-analytics-mcp run --transport streamable-http
Endpoint: http://127.0.0.1:8000/mcp
stdio
uv run duckdb-analytics-mcp run --transport stdio
MCP Client Integration
Generic stdio server config
{
"command": "uv",
"args": ["run", "duckdb-analytics-mcp", "run", "--transport", "stdio"],
"cwd": "/path/to/duckdb-analytics-mcp",
"env": {
"DATASET_DIR": "data"
}
}
Generic streamable-http config
Use base URL:
http://127.0.0.1:8000/mcp
Tool Reference
duckdb_analytics_health
Returns server and catalog health metadata.
Parameters:
response_format(markdownorjson, defaultmarkdown)
duckdb_analytics_list_datasets
Lists datasets with pagination.
Parameters:
limit(default25)offset(default0)response_format(markdownorjson, defaultmarkdown)
duckdb_analytics_describe_dataset
Returns dataset metadata, schema, row count, and sample rows.
Parameters:
dataset(relative catalog name, for examplesales/orders.csv)sample_rows(default10, bounded byMAX_SAMPLE_ROWS)response_format(markdownorjson, defaultmarkdown)
duckdb_analytics_query_dataset
Runs guarded read-only SQL against the dataset using the table alias source.
Parameters:
dataset(relative catalog name)sql(must start withSELECTorWITH)limit(default25, bounded byMAX_LIMIT)offset(default0)response_format(markdownorjson, defaultmarkdown)
Error Behavior
describe/queryreturn tool-level errors for invalid dataset names, SQL guard violations, and timeout conditions.- In
jsonmode, tool errors are returned as{ "error": "..." }. - In
markdownmode, tool errors are returned asError: ....
Resource Reference
dataset://catalog— full dataset catalogdataset://schema/{dataset_name}— schema + sample rows for one dataset
SQL Safety Model
duckdb_analytics_query_dataset enforces:
- single statement only
- no SQL comments (
--,/* ... */) - query must start with
SELECTorWITH - banned keywords (for example
INSERT,UPDATE,DELETE,DROP,ALTER,CREATE,PRAGMA,COPY,INSTALL,LOAD)
CLI Commands
uv run duckdb-analytics-mcp --help
uv run duckdb-analytics-mcp run --help
uv run duckdb-analytics-mcp doctor
doctor prints deployment-readiness JSON, including health and dataset counts.
Docker
docker build -t duckdb-analytics-mcp:latest .
docker run --rm -p 8000:8000 \
-e HOST=0.0.0.0 \
-e PORT=8000 \
-e TRANSPORT=streamable-http \
duckdb-analytics-mcp:latest
Or with compose:
docker compose up --build
Development Verification
uv run ruff check .
uv run mypy src
uv run pytest -q
Troubleshooting
Dataset '<name>' not found:- Run
duckdb_analytics_list_datasetsfirst and use an exact dataset name from catalog.
- Run
Operation timed out after ... seconds:- Simplify query, lower scan volume, or increase
QUERY_TIMEOUT_SECONDS.
- Simplify query, lower scan volume, or increase
- Empty catalog:
- Confirm files exist under
DATASET_DIRand use supported suffixes.
- Confirm files exist under
- Auth startup error:
- If
STATIC_BEARER_TOKENis set, also setAUTH_ISSUER_URLandAUTH_RESOURCE_SERVER_URL.
- If
Project Structure
src/duckdb_analytics_mcp/
cli.py
config.py
server.py
service.py
catalog.py
security.py
formatter.py
data/
tests/
notebooks/
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.