duckdb_analytics_mcp

duckdb_analytics_mcp

Provides read-only SQL analytics on local CSV, Parquet, and JSON datasets using DuckDB, with pagination, security guardrails, and health monitoring.

Category
Visit Server

README

DuckDB Analytics MCP

Read-only MCP server for querying local datasets (.csv, .parquet, .json, .jsonl) through DuckDB with explicit SQL guardrails, pagination, timeouts, and optional bearer-token auth.

What This Server Provides

  • MCP transports: stdio and streamable-http
  • Tool responses in either markdown or json
  • Dataset catalog + schema resources
  • Read-only SQL execution over a registered source table alias
  • Guardrails for query size, statement shape, and banned SQL keywords
  • Optional auth with required scope enforcement

Supported Dataset Formats

Files are discovered recursively under DATASET_DIR.

  • .csv
  • .parquet
  • .json
  • .jsonl

Setup

git clone https://github.com/pypi-ahmad/duckdb-analytics-mcp.git
cd duckdb-analytics-mcp
uv python pin 3.12.10
uv sync --dev
cp .env.example .env

Configuration

Runtime is configured through environment variables (loaded from .env by default).

Variable Default Description
SERVER_NAME duckdb_analytics_mcp MCP server name
LOG_LEVEL INFO Log level (DEBUG/INFO/WARNING/ERROR/CRITICAL)
HOST 127.0.0.1 Bind host for HTTP transport
PORT 8000 Bind port for HTTP transport
TRANSPORT streamable-http Default transport (stdio or streamable-http)
DATASET_DIR data Dataset root; relative paths resolve from project root
DEFAULT_LIMIT 25 Default page size
MAX_LIMIT 200 Upper bound for list/query page size
MAX_QUERY_CHARS 4000 Max SQL length
QUERY_TIMEOUT_SECONDS 20 Timeout for describe/query operations
MAX_SAMPLE_ROWS 25 Max rows returned for dataset samples
DUCKDB_THREADS 4 DuckDB execution threads
DUCKDB_MEMORY_LIMIT 1GB DuckDB memory limit
STATIC_BEARER_TOKEN unset Enables token verification when auth URLs are also set
AUTH_ISSUER_URL unset Auth issuer URL (required with STATIC_BEARER_TOKEN)
AUTH_RESOURCE_SERVER_URL unset Resource server URL (required with STATIC_BEARER_TOKEN)
AUTH_REQUIRED_SCOPE analytics.read Required access scope

Auth note:

  • If STATIC_BEARER_TOKEN is set, both AUTH_ISSUER_URL and AUTH_RESOURCE_SERVER_URL must also be set.

Run

Streamable HTTP

uv run duckdb-analytics-mcp run --transport streamable-http

Endpoint: http://127.0.0.1:8000/mcp

stdio

uv run duckdb-analytics-mcp run --transport stdio

MCP Client Integration

Generic stdio server config

{
  "command": "uv",
  "args": ["run", "duckdb-analytics-mcp", "run", "--transport", "stdio"],
  "cwd": "/path/to/duckdb-analytics-mcp",
  "env": {
    "DATASET_DIR": "data"
  }
}

Generic streamable-http config

Use base URL:

http://127.0.0.1:8000/mcp

Tool Reference

duckdb_analytics_health

Returns server and catalog health metadata.

Parameters:

  • response_format (markdown or json, default markdown)

duckdb_analytics_list_datasets

Lists datasets with pagination.

Parameters:

  • limit (default 25)
  • offset (default 0)
  • response_format (markdown or json, default markdown)

duckdb_analytics_describe_dataset

Returns dataset metadata, schema, row count, and sample rows.

Parameters:

  • dataset (relative catalog name, for example sales/orders.csv)
  • sample_rows (default 10, bounded by MAX_SAMPLE_ROWS)
  • response_format (markdown or json, default markdown)

duckdb_analytics_query_dataset

Runs guarded read-only SQL against the dataset using the table alias source.

Parameters:

  • dataset (relative catalog name)
  • sql (must start with SELECT or WITH)
  • limit (default 25, bounded by MAX_LIMIT)
  • offset (default 0)
  • response_format (markdown or json, default markdown)

Error Behavior

  • describe/query return tool-level errors for invalid dataset names, SQL guard violations, and timeout conditions.
  • In json mode, tool errors are returned as { "error": "..." }.
  • In markdown mode, tool errors are returned as Error: ....

Resource Reference

  • dataset://catalog — full dataset catalog
  • dataset://schema/{dataset_name} — schema + sample rows for one dataset

SQL Safety Model

duckdb_analytics_query_dataset enforces:

  • single statement only
  • no SQL comments (--, /* ... */)
  • query must start with SELECT or WITH
  • banned keywords (for example INSERT, UPDATE, DELETE, DROP, ALTER, CREATE, PRAGMA, COPY, INSTALL, LOAD)

CLI Commands

uv run duckdb-analytics-mcp --help
uv run duckdb-analytics-mcp run --help
uv run duckdb-analytics-mcp doctor

doctor prints deployment-readiness JSON, including health and dataset counts.

Docker

docker build -t duckdb-analytics-mcp:latest .
docker run --rm -p 8000:8000 \
  -e HOST=0.0.0.0 \
  -e PORT=8000 \
  -e TRANSPORT=streamable-http \
  duckdb-analytics-mcp:latest

Or with compose:

docker compose up --build

Development Verification

uv run ruff check .
uv run mypy src
uv run pytest -q

Troubleshooting

  • Dataset '<name>' not found:
    • Run duckdb_analytics_list_datasets first and use an exact dataset name from catalog.
  • Operation timed out after ... seconds:
    • Simplify query, lower scan volume, or increase QUERY_TIMEOUT_SECONDS.
  • Empty catalog:
    • Confirm files exist under DATASET_DIR and use supported suffixes.
  • Auth startup error:
    • If STATIC_BEARER_TOKEN is set, also set AUTH_ISSUER_URL and AUTH_RESOURCE_SERVER_URL.

Project Structure

src/duckdb_analytics_mcp/
  cli.py
  config.py
  server.py
  service.py
  catalog.py
  security.py
  formatter.py

data/
tests/
notebooks/

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured