MCP Servers

Databricks MCP Server

Enables LLM-powered tools to interact with Databricks clusters, jobs, notebooks, SQL warehouses, and Unity Catalog through the Model Completion Protocol. Provides comprehensive access to Databricks REST API functionality including cluster management, job execution, workspace operations, and data catalog operations.

README

🤖 Databricks Custom MCP Demo

</div>

<br>

Databricks MCP Server

A Model Completion Protocol (MCP) server for Databricks that provides access to Databricks functionality via the MCP protocol. This allows LLM-powered tools to interact with Databricks clusters, jobs, notebooks, and more.

Credit for the initial version goes to @JustTryAI and Markov

Features

MCP Protocol Support: Implements the MCP protocol to allow LLMs to interact with Databricks
Databricks API Integration: Provides access to Databricks REST API functionality
Tool Registration: Exposes Databricks functionality as MCP tools
Async Support: Built with asyncio for efficient operation

Available Tools

The Databricks MCP Server exposes the following tools:

Cluster Management

list_clusters: List all Databricks clusters
create_cluster: Create a new Databricks cluster
terminate_cluster: Terminate a Databricks cluster
get_cluster: Get information about a specific Databricks cluster
start_cluster: Start a terminated Databricks cluster

Job Management

list_jobs: List all Databricks jobs
run_job: Run a Databricks job
run_notebook: Submit and wait for a one-time notebook run
create_job: Create a new Databricks job
delete_job: Delete a Databricks job
get_run_status: Get status information for a job run
list_job_runs: List recent runs for a job
cancel_run: Cancel a running job

Workspace Files

list_notebooks: List notebooks in a workspace directory
export_notebook: Export a notebook from the workspace
import_notebook: Import a notebook into the workspace
delete_workspace_object: Delete a notebook or directory
get_workspace_file_content: Retrieve content of any workspace file (JSON, notebooks, scripts, etc.)
get_workspace_file_info: Get metadata about workspace files

File System

list_files: List files and directories in a DBFS path
dbfs_put: Upload a small file to DBFS
dbfs_delete: Delete a DBFS file or directory

Cluster Libraries

install_library: Install libraries on a cluster
uninstall_library: Remove libraries from a cluster
list_cluster_libraries: Check installed libraries on a cluster

Repos

create_repo: Clone a Git repository
update_repo: Update an existing repo
list_repos: List repos in the workspace
pull_repo: Pull the latest commit for a Databricks repo

Unity Catalog

list_catalogs: List catalogs
create_catalog: Create a catalog
list_schemas: List schemas in a catalog
create_schema: Create a schema
list_tables: List tables in a schema
create_table: Execute a CREATE TABLE statement
get_table_lineage: Fetch lineage information for a table

Composite

sync_repo_and_run_notebook: Pull a repo and execute a notebook in one call

SQL Execution

execute_sql: Execute a SQL statement (warehouse_id optional if DATABRICKS_WAREHOUSE_ID env var is set)

Manual Installation

Prerequisites

Python 3.10 or higher
uv package manager (recommended for MCP servers)

Setup

Install uv if you don't have it already:

# MacOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (in PowerShell)
irm https://astral.sh/uv/install.ps1 | iex

Restart your terminal after installation.

Clone the repository:

git clone https://github.com/robkisk/databricks-mcp.git
cd databricks-mcp

Run the setup script:

# Linux/Mac
./scripts/setup.sh

# Windows (PowerShell)
.\scripts\setup.ps1

The setup script will:

Install uv if not already installed
Create a virtual environment
Install all project dependencies
Verify the installation works

Alternative manual setup:

# Create and activate virtual environment
uv venv

# On Windows
.\.venv\Scripts\activate

# On Linux/Mac
source .venv/bin/activate

# Install dependencies in development mode
uv pip install -e .

# Install development dependencies
uv pip install -e ".[dev]"

Set up environment variables:

# Required variables
# Windows
set DATABRICKS_HOST=https://your-databricks-instance.azuredatabricks.net
set DATABRICKS_TOKEN=your-personal-access-token

# Linux/Mac
export DATABRICKS_HOST=https://your-databricks-instance.azuredatabricks.net
export DATABRICKS_TOKEN=your-personal-access-token

# Optional: Set default SQL warehouse (makes warehouse_id optional in execute_sql)
export DATABRICKS_WAREHOUSE_ID=sql_warehouse_12345

You can also create an .env file based on the .env.example template.

Running the MCP Server

Standalone

To start the MCP server directly for testing or development, run:

# Activate your virtual environment if not already active
source .venv/bin/activate 

# Run the start script (handles finding env vars from .env if needed)
./scripts/start_mcp_server.sh

This is useful for seeing direct output and logs.

Integrating with AI Clients

To use this server with AI clients like Cursor or Claude CLI, you need to register it.

Cursor Setup

Open your global MCP configuration file located at ~/.cursor/mcp.json (create it if it doesn't exist).

Add the following entry within the mcpServers object, replacing placeholders with your actual values and ensuring the path to start_mcp_server.sh is correct:

{
  "mcpServers": {
    // ... other servers ...
    "databricks-mcp-local": { 
      "command": "/absolute/path/to/your/project/databricks-mcp-server/start_mcp_server.sh",
      "args": [],
      "env": {
        "DATABRICKS_HOST": "https://your-databricks-instance.azuredatabricks.net", 
        "DATABRICKS_TOKEN": "dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX",
        "DATABRICKS_WAREHOUSE_ID": "sql_warehouse_12345",
        "RUNNING_VIA_CURSOR_MCP": "true" 
      }
    }
    // ... other servers ...
  }
}

Important: Replace /absolute/path/to/your/project/databricks-mcp-server/ with the actual absolute path to this project directory on your machine.
Replace the DATABRICKS_HOST and DATABRICKS_TOKEN values with your credentials.
Save the file and restart Cursor.
You can now invoke tools using databricks-mcp-local:<tool_name> (e.g., databricks-mcp-local:list_jobs).

Claude CLI Setup

Use the claude mcp add command to register the server. Provide your credentials using the -e flag for environment variables and point the command to the start_mcp_server.sh script using -- followed by the absolute path:

claude mcp add databricks-mcp-local \
  -s user \
  -e DATABRICKS_HOST="https://your-databricks-instance.azuredatabricks.net" \
  -e DATABRICKS_TOKEN="dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \
  -e DATABRICKS_WAREHOUSE_ID="sql_warehouse_12345" \
  -- /absolute/path/to/your/project/databricks-mcp-server/start_mcp_server.sh

Important: Replace /absolute/path/to/your/project/databricks-mcp-server/ with the actual absolute path to this project directory on your machine.
Replace the DATABRICKS_HOST and DATABRICKS_TOKEN values with your credentials.
You can now invoke tools using databricks-mcp-local:<tool_name> in your Claude interactions.

Querying Databricks Resources

The repository includes utility scripts to quickly view Databricks resources:

# View all clusters
uv run scripts/show_clusters.py

# View all notebooks
uv run scripts/show_notebooks.py

Usage Examples

SQL Execution with Default Warehouse

# With DATABRICKS_WAREHOUSE_ID set, warehouse_id is optional
await session.call_tool("execute_sql", {
    "statement": "SELECT * FROM my_table LIMIT 10"
})

# You can still override the default warehouse
await session.call_tool("execute_sql", {
    "statement": "SELECT * FROM my_table LIMIT 10",
    "warehouse_id": "sql_warehouse_specific"
})

Workspace File Content Retrieval

# Get JSON file content from workspace
await session.call_tool("get_workspace_file_content", {
    "workspace_path": "/Users/user@domain.com/config/settings.json"
})

# Get notebook content in Jupyter format
await session.call_tool("get_workspace_file_content", {
    "workspace_path": "/Users/user@domain.com/my_notebook",
    "format": "JUPYTER"
})

# Get file metadata without downloading content
await session.call_tool("get_workspace_file_info", {
    "workspace_path": "/Users/user@domain.com/large_file.py"
})

Repo Sync and Notebook Execution

await session.call_tool("sync_repo_and_run_notebook", {
    "repo_id": 123,
    "notebook_path": "/Repos/user/project/run_me"
})

Create Nightly ETL Job

job_conf = {
    "name": "Nightly ETL",
    "tasks": [
        {
            "task_key": "etl",
            "notebook_task": {"notebook_path": "/Repos/me/etl.py"},
            "existing_cluster_id": "abc-123"
        }
    ]
}
await session.call_tool("create_job", job_conf)

Project Structure

databricks-mcp/
├── databricks_mcp/                  # Main package (renamed from src/)
│   ├── __init__.py                  # Package initialization
│   ├── __main__.py                  # Main entry point for the package
│   ├── main.py                      # Entry point for the MCP server
│   ├── api/                         # Databricks API clients
│   │   ├── clusters.py              # Cluster management
│   │   ├── jobs.py                  # Job management
│   │   ├── notebooks.py             # Notebook operations
│   │   ├── sql.py                   # SQL execution
│   │   └── dbfs.py                  # DBFS operations
│   ├── core/                        # Core functionality
│   │   ├── config.py                # Configuration management
│   │   ├── auth.py                  # Authentication
│   │   └── utils.py                 # Utilities
│   ├── server/                      # Server implementation
│   │   ├── __main__.py              # Server entry point
│   │   ├── databricks_mcp_server.py # Main MCP server
│   │   └── app.py                   # FastAPI app for tests
│   └── cli/                         # Command-line interface
│       └── commands.py              # CLI commands
├── tests/                           # Test directory
│   ├── test_clusters.py             # Cluster tests
│   ├── test_mcp_server.py           # Server tests
│   └── test_*.py                    # Other test files
├── scripts/                         # Helper scripts (organized)
│   ├── start_mcp_server.ps1         # Server startup script (Windows)
│   ├── start_mcp_server.sh          # Server startup script (Unix)
│   ├── run_tests.ps1                # Test runner script (Windows)
│   ├── run_tests.sh                 # Test runner script (Unix)
│   ├── setup.ps1                    # Setup script (Windows)
│   ├── setup.sh                     # Setup script (Unix)
│   ├── show_clusters.py             # Script to show clusters
│   ├── show_notebooks.py            # Script to show notebooks
│   ├── setup_codespaces.sh          # Codespaces setup
│   └── test_setup_local.sh          # Local test setup
├── examples/                        # Example usage
│   ├── direct_usage.py              # Direct usage examples
│   └── mcp_client_usage.py          # MCP client examples
├── docs/                            # Documentation (organized)
│   ├── AGENTS.md                    # Agent documentation
│   ├── project_structure.md         # Detailed structure docs
│   ├── new_features.md              # Feature documentation
│   └── phase1.md                    # Development phases
├── .gitignore                       # Git ignore rules
├── .cursor.json                     # Cursor configuration
├── pyproject.toml                   # Package configuration
├── uv.lock                          # Dependency lock file
└── README.md                        # This file

See docs/project_structure.md for a more detailed view of the project structure.

Development

Code Standards

Python code follows PEP 8 style guide with a maximum line length of 100 characters
Use 4 spaces for indentation (no tabs)
Use double quotes for strings
All classes, methods, and functions should have Google-style docstrings
Type hints are required for all code except tests

Linting

The project uses the following linting tools:

# Run all linters
uv run pylint databricks_mcp/ tests/
uv run flake8 databricks_mcp/ tests/
uv run mypy databricks_mcp/

Testing

The project uses pytest for testing. To run the tests:

# Run all tests with our convenient script
.\scripts\run_tests.ps1

# Run with coverage report
.\scripts\run_tests.ps1 -Coverage

# Run specific tests with verbose output
.\scripts\run_tests.ps1 -Verbose -Coverage tests/test_clusters.py

You can also run the tests directly with pytest:

# Run all tests
uv run pytest tests/

# Run with coverage report
uv run pytest --cov=databricks_mcp tests/ --cov-report=term-missing

A minimum code coverage of 80% is the goal for the project.

Documentation

API documentation is generated using Sphinx and can be found in the docs/api directory
All code includes Google-style docstrings
See the examples/ directory for usage examples

Examples

Check the examples/ directory for usage examples. To run examples:

# Run example scripts with uv
uv run examples/direct_usage.py
uv run examples/mcp_client_usage.py

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Ensure your code follows the project's coding standards
Add tests for any new functionality
Update documentation as necessary
Verify all tests pass before submitting

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

E2B

Using MCP to run code via e2b.

Official

Featured