Databricks MCP Server

Databricks MCP Server

A Model Context Protocol server that lets LLM clients answer business questions in natural language over a Databricks dataset without writing SQL by hand.

Category
Visit Server

README

Databricks MCP Server — Natural-Language Analytics POC

A small Model Context Protocol server that lets an LLM client (e.g. Claude Desktop) answer business questions in natural language over a Databricks dataset — without writing SQL by hand.

It runs against the public samples.nyctaxi.trips dataset that ships with every Databricks workspace, so it's reproducible by anyone.

What it exposes (the three MCP primitives)

Primitive Name Purpose
Tool run_query Executes a read-only SQL query against samples.nyctaxi.trips and returns the rows.
Resource schema://nyctaxi Curated schema + metric definitions and gotchas — the context layer that makes the generated SQL correct.
Prompts revenue_by_month, busiest_pickup_zones, trips_by_hour, fare_distance_summary Ready-made business questions.

Safety / governance

Two layers, on purpose:

  1. App-level guard (is_read_only): only a single SELECT/WITH statement is accepted; any write/DDL keyword (INSERT, UPDATE, DROP, ...) is rejected, and a LIMIT 1000 is appended when missing.
  2. The real guarantee: connect with a Databricks token whose grants are read-only on the catalog. App guards reduce footguns; permissions are what actually protect the data. Never give an LLM a write-capable credential.

Architecture

Claude Desktop  ──stdio──►  MCP server (this repo)  ──Databricks SQL connector──►  samples.nyctaxi.trips
   (client)                  tool · resource · prompts                              (read-only)

run_query doesn't open the connection in-process — it shells out to query_runner.py (subprocess.run(..., stdin=subprocess.DEVNULL, capture_output=True)). See the note below for why.

Implementation note: why run_query uses a subprocess

Both points were reproduced and verified on Windows + the FastMCP stdio transport (Claude Desktop and the MCP Inspector). Symptom in both: the tool call hangs and the client returns MCP error -32001: Request timed out at ~60s, even though the same query runs in ~4s with the connector directly.

  1. sql.connect() stalls ~60s when called inside the server process. From a clean child process it connects in ~2s; inside the FastMCP process it blocks until the client's request times out. It stalls on the event-loop thread and on a worker thread, so it's a process-level interaction with the connector — not just the event loop being blocked. Running the query in a child process avoids it. (Disabling telemetry / use_cloud_fetch does not help.)

  2. stdin=subprocess.DEVNULL is required on the child. A stdio MCP server's own stdin is the JSON-RPC pipe from the client. A child started with the default stdin=None inherits that pipe handle and hangs until the client gives up (~60s). Detaching stdin makes it return at query speed. capture_output=True already detaches stdout/stderr — stdin is the one that's easy to miss, so piping the query out to a subprocess without it does not fix the hang.

Gotcha — don't launch the Inspector from Git Bash on Windows. MSYS2 rewrites the POSIX-looking DATABRICKS_HTTP_PATH (/sql/1.0/warehouses/…C:/Program Files/Git/sql/1.0/warehouses/…), so the server gets a 404, not a timeout. Use PowerShell or cmd. Claude Desktop passes env vars directly and is unaffected.

Run it

Prereqs: Python 3.11+, uv, a Databricks workspace with a running SQL Warehouse and the samples catalog.

Windows / PowerShell (recommended on Windows — see the Git Bash gotcha above):

cd "C:\path\to\databricks-mcp"
uv sync                                    # first time only

# from SQL Warehouses -> Connection details, plus a personal access token.
# These live only in THIS PowerShell window (nothing is written to disk):
$env:DATABRICKS_HOST      = "dbc-xxxxxxxx-xxxx.cloud.databricks.com"
$env:DATABRICKS_HTTP_PATH = "/sql/1.0/warehouses/xxxxxxxxxxxxxxxx"
$env:DATABRICKS_TOKEN     = "dapixxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# launch the browser inspector, then run a query from its UI:
npx @modelcontextprotocol/inspector uv run server.py

<details> <summary><b>macOS / Linux (bash)</b></summary>

uv sync
export DATABRICKS_HOST="adb-....azuredatabricks.net"
export DATABRICKS_HTTP_PATH="/sql/1.0/warehouses/...."
export DATABRICKS_TOKEN="dapi...."
npx @modelcontextprotocol/inspector uv run server.py

</details>

Connect to Claude Desktop

Copy claude_desktop_config.example.json into your Claude Desktop config (%APPDATA%\Claude\claude_desktop_config.json on Windows, ~/Library/Application Support/Claude/claude_desktop_config.json on macOS), fill in your real values, and restart Claude Desktop. Then ask things like:

"What were the busiest pickup zones, and how does monthly revenue trend?"

Notes

  • samples.nyctaxi.trips is a public Databricks dataset; no private data is used.
  • Secrets live in env vars / the Claude Desktop config, both git-ignored.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured