Databricks MCP Server
Enables LLM-powered tools to interact with Databricks clusters, jobs, notebooks, SQL warehouses, and Unity Catalog through the Model Completion Protocol. Provides comprehensive access to Databricks REST API functionality including cluster management, job execution, workspace operations, and data catalog operations.
README
<div align="center">
š¤ Databricks Custom MCP Demo
</div>
<br>
Databricks MCP Server
A Model Completion Protocol (MCP) server for Databricks that provides access to Databricks functionality via the MCP protocol. This allows LLM-powered tools to interact with Databricks clusters, jobs, notebooks, and more.
Credit for the initial version goes to @JustTryAI and Markov
Features
- MCP Protocol Support: Implements the MCP protocol to allow LLMs to interact with Databricks
- Databricks API Integration: Provides access to Databricks REST API functionality
- Tool Registration: Exposes Databricks functionality as MCP tools
- Async Support: Built with asyncio for efficient operation
Available Tools
The Databricks MCP Server exposes the following tools:
Cluster Management
- list_clusters: List all Databricks clusters
- create_cluster: Create a new Databricks cluster
- terminate_cluster: Terminate a Databricks cluster
- get_cluster: Get information about a specific Databricks cluster
- start_cluster: Start a terminated Databricks cluster
Job Management
- list_jobs: List all Databricks jobs
- run_job: Run a Databricks job
- run_notebook: Submit and wait for a one-time notebook run
- create_job: Create a new Databricks job
- delete_job: Delete a Databricks job
- get_run_status: Get status information for a job run
- list_job_runs: List recent runs for a job
- cancel_run: Cancel a running job
Workspace Files
- list_notebooks: List notebooks in a workspace directory
- export_notebook: Export a notebook from the workspace
- import_notebook: Import a notebook into the workspace
- delete_workspace_object: Delete a notebook or directory
- get_workspace_file_content: Retrieve content of any workspace file (JSON, notebooks, scripts, etc.)
- get_workspace_file_info: Get metadata about workspace files
File System
- list_files: List files and directories in a DBFS path
- dbfs_put: Upload a small file to DBFS
- dbfs_delete: Delete a DBFS file or directory
Cluster Libraries
- install_library: Install libraries on a cluster
- uninstall_library: Remove libraries from a cluster
- list_cluster_libraries: Check installed libraries on a cluster
Repos
- create_repo: Clone a Git repository
- update_repo: Update an existing repo
- list_repos: List repos in the workspace
- pull_repo: Pull the latest commit for a Databricks repo
Unity Catalog
- list_catalogs: List catalogs
- create_catalog: Create a catalog
- list_schemas: List schemas in a catalog
- create_schema: Create a schema
- list_tables: List tables in a schema
- create_table: Execute a CREATE TABLE statement
- get_table_lineage: Fetch lineage information for a table
Composite
- sync_repo_and_run_notebook: Pull a repo and execute a notebook in one call
SQL Execution
- execute_sql: Execute a SQL statement (warehouse_id optional if DATABRICKS_WAREHOUSE_ID env var is set)
Manual Installation
Prerequisites
- Python 3.10 or higher
uvpackage manager (recommended for MCP servers)
Setup
-
Install
uvif you don't have it already:# MacOS/Linux curl -LsSf https://astral.sh/uv/install.sh | sh # Windows (in PowerShell) irm https://astral.sh/uv/install.ps1 | iexRestart your terminal after installation.
-
Clone the repository:
git clone https://github.com/robkisk/databricks-mcp.git cd databricks-mcp -
Run the setup script:
# Linux/Mac ./scripts/setup.sh # Windows (PowerShell) .\scripts\setup.ps1The setup script will:
- Install
uvif not already installed - Create a virtual environment
- Install all project dependencies
- Verify the installation works
Alternative manual setup:
# Create and activate virtual environment uv venv # On Windows .\.venv\Scripts\activate # On Linux/Mac source .venv/bin/activate # Install dependencies in development mode uv pip install -e . # Install development dependencies uv pip install -e ".[dev]" - Install
-
Set up environment variables:
# Required variables # Windows set DATABRICKS_HOST=https://your-databricks-instance.azuredatabricks.net set DATABRICKS_TOKEN=your-personal-access-token # Linux/Mac export DATABRICKS_HOST=https://your-databricks-instance.azuredatabricks.net export DATABRICKS_TOKEN=your-personal-access-token # Optional: Set default SQL warehouse (makes warehouse_id optional in execute_sql) export DATABRICKS_WAREHOUSE_ID=sql_warehouse_12345You can also create an
.envfile based on the.env.exampletemplate.
Running the MCP Server
Standalone
To start the MCP server directly for testing or development, run:
# Activate your virtual environment if not already active
source .venv/bin/activate
# Run the start script (handles finding env vars from .env if needed)
./scripts/start_mcp_server.sh
This is useful for seeing direct output and logs.
Integrating with AI Clients
To use this server with AI clients like Cursor or Claude CLI, you need to register it.
Cursor Setup
-
Open your global MCP configuration file located at
~/.cursor/mcp.json(create it if it doesn't exist). -
Add the following entry within the
mcpServersobject, replacing placeholders with your actual values and ensuring the path tostart_mcp_server.shis correct:{ "mcpServers": { // ... other servers ... "databricks-mcp-local": { "command": "/absolute/path/to/your/project/databricks-mcp-server/start_mcp_server.sh", "args": [], "env": { "DATABRICKS_HOST": "https://your-databricks-instance.azuredatabricks.net", "DATABRICKS_TOKEN": "dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX", "DATABRICKS_WAREHOUSE_ID": "sql_warehouse_12345", "RUNNING_VIA_CURSOR_MCP": "true" } } // ... other servers ... } } -
Important: Replace
/absolute/path/to/your/project/databricks-mcp-server/with the actual absolute path to this project directory on your machine. -
Replace the
DATABRICKS_HOSTandDATABRICKS_TOKENvalues with your credentials. -
Save the file and restart Cursor.
-
You can now invoke tools using
databricks-mcp-local:<tool_name>(e.g.,databricks-mcp-local:list_jobs).
Claude CLI Setup
-
Use the
claude mcp addcommand to register the server. Provide your credentials using the-eflag for environment variables and point the command to thestart_mcp_server.shscript using--followed by the absolute path:claude mcp add databricks-mcp-local \ -s user \ -e DATABRICKS_HOST="https://your-databricks-instance.azuredatabricks.net" \ -e DATABRICKS_TOKEN="dapiXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" \ -e DATABRICKS_WAREHOUSE_ID="sql_warehouse_12345" \ -- /absolute/path/to/your/project/databricks-mcp-server/start_mcp_server.sh -
Important: Replace
/absolute/path/to/your/project/databricks-mcp-server/with the actual absolute path to this project directory on your machine. -
Replace the
DATABRICKS_HOSTandDATABRICKS_TOKENvalues with your credentials. -
You can now invoke tools using
databricks-mcp-local:<tool_name>in your Claude interactions.
Querying Databricks Resources
The repository includes utility scripts to quickly view Databricks resources:
# View all clusters
uv run scripts/show_clusters.py
# View all notebooks
uv run scripts/show_notebooks.py
Usage Examples
SQL Execution with Default Warehouse
# With DATABRICKS_WAREHOUSE_ID set, warehouse_id is optional
await session.call_tool("execute_sql", {
"statement": "SELECT * FROM my_table LIMIT 10"
})
# You can still override the default warehouse
await session.call_tool("execute_sql", {
"statement": "SELECT * FROM my_table LIMIT 10",
"warehouse_id": "sql_warehouse_specific"
})
Workspace File Content Retrieval
# Get JSON file content from workspace
await session.call_tool("get_workspace_file_content", {
"workspace_path": "/Users/user@domain.com/config/settings.json"
})
# Get notebook content in Jupyter format
await session.call_tool("get_workspace_file_content", {
"workspace_path": "/Users/user@domain.com/my_notebook",
"format": "JUPYTER"
})
# Get file metadata without downloading content
await session.call_tool("get_workspace_file_info", {
"workspace_path": "/Users/user@domain.com/large_file.py"
})
Repo Sync and Notebook Execution
await session.call_tool("sync_repo_and_run_notebook", {
"repo_id": 123,
"notebook_path": "/Repos/user/project/run_me"
})
Create Nightly ETL Job
job_conf = {
"name": "Nightly ETL",
"tasks": [
{
"task_key": "etl",
"notebook_task": {"notebook_path": "/Repos/me/etl.py"},
"existing_cluster_id": "abc-123"
}
]
}
await session.call_tool("create_job", job_conf)
Project Structure
databricks-mcp/
āāā databricks_mcp/ # Main package (renamed from src/)
ā āāā __init__.py # Package initialization
ā āāā __main__.py # Main entry point for the package
ā āāā main.py # Entry point for the MCP server
ā āāā api/ # Databricks API clients
ā ā āāā clusters.py # Cluster management
ā ā āāā jobs.py # Job management
ā ā āāā notebooks.py # Notebook operations
ā ā āāā sql.py # SQL execution
ā ā āāā dbfs.py # DBFS operations
ā āāā core/ # Core functionality
ā ā āāā config.py # Configuration management
ā ā āāā auth.py # Authentication
ā ā āāā utils.py # Utilities
ā āāā server/ # Server implementation
ā ā āāā __main__.py # Server entry point
ā ā āāā databricks_mcp_server.py # Main MCP server
ā ā āāā app.py # FastAPI app for tests
ā āāā cli/ # Command-line interface
ā āāā commands.py # CLI commands
āāā tests/ # Test directory
ā āāā test_clusters.py # Cluster tests
ā āāā test_mcp_server.py # Server tests
ā āāā test_*.py # Other test files
āāā scripts/ # Helper scripts (organized)
ā āāā start_mcp_server.ps1 # Server startup script (Windows)
ā āāā start_mcp_server.sh # Server startup script (Unix)
ā āāā run_tests.ps1 # Test runner script (Windows)
ā āāā run_tests.sh # Test runner script (Unix)
ā āāā setup.ps1 # Setup script (Windows)
ā āāā setup.sh # Setup script (Unix)
ā āāā show_clusters.py # Script to show clusters
ā āāā show_notebooks.py # Script to show notebooks
ā āāā setup_codespaces.sh # Codespaces setup
ā āāā test_setup_local.sh # Local test setup
āāā examples/ # Example usage
ā āāā direct_usage.py # Direct usage examples
ā āāā mcp_client_usage.py # MCP client examples
āāā docs/ # Documentation (organized)
ā āāā AGENTS.md # Agent documentation
ā āāā project_structure.md # Detailed structure docs
ā āāā new_features.md # Feature documentation
ā āāā phase1.md # Development phases
āāā .gitignore # Git ignore rules
āāā .cursor.json # Cursor configuration
āāā pyproject.toml # Package configuration
āāā uv.lock # Dependency lock file
āāā README.md # This file
See docs/project_structure.md for a more detailed view of the project structure.
Development
Code Standards
- Python code follows PEP 8 style guide with a maximum line length of 100 characters
- Use 4 spaces for indentation (no tabs)
- Use double quotes for strings
- All classes, methods, and functions should have Google-style docstrings
- Type hints are required for all code except tests
Linting
The project uses the following linting tools:
# Run all linters
uv run pylint databricks_mcp/ tests/
uv run flake8 databricks_mcp/ tests/
uv run mypy databricks_mcp/
Testing
The project uses pytest for testing. To run the tests:
# Run all tests with our convenient script
.\scripts\run_tests.ps1
# Run with coverage report
.\scripts\run_tests.ps1 -Coverage
# Run specific tests with verbose output
.\scripts\run_tests.ps1 -Verbose -Coverage tests/test_clusters.py
You can also run the tests directly with pytest:
# Run all tests
uv run pytest tests/
# Run with coverage report
uv run pytest --cov=databricks_mcp tests/ --cov-report=term-missing
A minimum code coverage of 80% is the goal for the project.
Documentation
- API documentation is generated using Sphinx and can be found in the
docs/apidirectory - All code includes Google-style docstrings
- See the
examples/directory for usage examples
Examples
Check the examples/ directory for usage examples. To run examples:
# Run example scripts with uv
uv run examples/direct_usage.py
uv run examples/mcp_client_usage.py
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Ensure your code follows the project's coding standards
- Add tests for any new functionality
- Update documentation as necessary
- Verify all tests pass before submitting
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
E2B
Using MCP to run code via e2b.