Databricks MCP Server
A Model Context Protocol server that lets LLM clients answer business questions in natural language over a Databricks dataset without writing SQL by hand.
README
Databricks MCP Server — Natural-Language Analytics POC
A small Model Context Protocol server that lets an LLM client (e.g. Claude Desktop) answer business questions in natural language over a Databricks dataset — without writing SQL by hand.
It runs against the public samples.nyctaxi.trips dataset that ships with every
Databricks workspace, so it's reproducible by anyone.
What it exposes (the three MCP primitives)
| Primitive | Name | Purpose |
|---|---|---|
| Tool | run_query |
Executes a read-only SQL query against samples.nyctaxi.trips and returns the rows. |
| Resource | schema://nyctaxi |
Curated schema + metric definitions and gotchas — the context layer that makes the generated SQL correct. |
| Prompts | revenue_by_month, busiest_pickup_zones, trips_by_hour, fare_distance_summary |
Ready-made business questions. |
Safety / governance
Two layers, on purpose:
- App-level guard (
is_read_only): only a singleSELECT/WITHstatement is accepted; any write/DDL keyword (INSERT,UPDATE,DROP, ...) is rejected, and aLIMIT 1000is appended when missing. - The real guarantee: connect with a Databricks token whose grants are read-only on the catalog. App guards reduce footguns; permissions are what actually protect the data. Never give an LLM a write-capable credential.
Architecture
Claude Desktop ──stdio──► MCP server (this repo) ──Databricks SQL connector──► samples.nyctaxi.trips
(client) tool · resource · prompts (read-only)
run_query doesn't open the connection in-process — it shells out to
query_runner.py (subprocess.run(..., stdin=subprocess.DEVNULL, capture_output=True)).
See the note below for why.
Implementation note: why run_query uses a subprocess
Both points were reproduced and verified on Windows + the FastMCP stdio
transport (Claude Desktop and the MCP Inspector). Symptom in both: the tool call
hangs and the client returns MCP error -32001: Request timed out at ~60s, even
though the same query runs in ~4s with the connector directly.
-
sql.connect()stalls ~60s when called inside the server process. From a clean child process it connects in ~2s; inside the FastMCP process it blocks until the client's request times out. It stalls on the event-loop thread and on a worker thread, so it's a process-level interaction with the connector — not just the event loop being blocked. Running the query in a child process avoids it. (Disabling telemetry /use_cloud_fetchdoes not help.) -
stdin=subprocess.DEVNULLis required on the child. A stdio MCP server's own stdin is the JSON-RPC pipe from the client. A child started with the defaultstdin=Noneinherits that pipe handle and hangs until the client gives up (~60s). Detaching stdin makes it return at query speed.capture_output=Truealready detaches stdout/stderr — stdin is the one that's easy to miss, so piping the query out to a subprocess without it does not fix the hang.
Gotcha — don't launch the Inspector from Git Bash on Windows. MSYS2 rewrites the POSIX-looking
DATABRICKS_HTTP_PATH(/sql/1.0/warehouses/…→C:/Program Files/Git/sql/1.0/warehouses/…), so the server gets a 404, not a timeout. Use PowerShell orcmd. Claude Desktop passes env vars directly and is unaffected.
Run it
Prereqs: Python 3.11+, uv, a Databricks workspace
with a running SQL Warehouse and the samples catalog.
Windows / PowerShell (recommended on Windows — see the Git Bash gotcha above):
cd "C:\path\to\databricks-mcp"
uv sync # first time only
# from SQL Warehouses -> Connection details, plus a personal access token.
# These live only in THIS PowerShell window (nothing is written to disk):
$env:DATABRICKS_HOST = "dbc-xxxxxxxx-xxxx.cloud.databricks.com"
$env:DATABRICKS_HTTP_PATH = "/sql/1.0/warehouses/xxxxxxxxxxxxxxxx"
$env:DATABRICKS_TOKEN = "dapixxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
# launch the browser inspector, then run a query from its UI:
npx @modelcontextprotocol/inspector uv run server.py
<details> <summary><b>macOS / Linux (bash)</b></summary>
uv sync
export DATABRICKS_HOST="adb-....azuredatabricks.net"
export DATABRICKS_HTTP_PATH="/sql/1.0/warehouses/...."
export DATABRICKS_TOKEN="dapi...."
npx @modelcontextprotocol/inspector uv run server.py
</details>
Connect to Claude Desktop
Copy claude_desktop_config.example.json into your Claude Desktop config
(%APPDATA%\Claude\claude_desktop_config.json on Windows,
~/Library/Application Support/Claude/claude_desktop_config.json on macOS),
fill in your real values, and restart Claude Desktop. Then ask things like:
"What were the busiest pickup zones, and how does monthly revenue trend?"
Notes
samples.nyctaxi.tripsis a public Databricks dataset; no private data is used.- Secrets live in env vars / the Claude Desktop config, both git-ignored.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.