horizon-mcp

horizon-mcp

Enables natural language querying of Snowflake-managed Iceberg tables via Horizon IRC and DuckDB, without using a Snowflake warehouse.

Category
Visit Server

README

horizon-mcp

Python MCP License

Natural language access to Snowflake-managed Iceberg tables via Horizon IRC, using DuckDB as the query engine.

Query your Snowflake Iceberg catalog through Claude Desktop without spinning up a Snowflake warehouse — no compute credits, full RBAC enforcement, vended credentials only.


Architecture

┌─────────────────┐       stdio / MCP        ┌──────────────────────┐
│  Claude Desktop │ ◄────────────────────── ► │    horizon-mcp       │
└─────────────────┘                           │  (this server)       │
                                              └──────────┬───────────┘
                                                         │
                              ┌──────────────────────────▼────────────────────────────┐
                              │          Snowflake Horizon IRC REST Catalog            │
                              │   GET /v1/{database}/namespaces                        │
                              │   GET /v1/{database}/namespaces/{ns}/tables            │
                              │   GET /v1/{database}/namespaces/{ns}/tables/{table}    │
                              └──────────────────────────┬────────────────────────────┘
                                                         │
                                               Vended credentials
                                         (ephemeral STS / SAS tokens)
                                                         │
                              ┌──────────────────────────▼────────────────────────────┐
                              │                     DuckDB                             │
                              │   ATTACH via Iceberg extension + OAuth secret          │
                              │   Reads Parquet files directly from cloud storage      │
                              └──────────────────────────┬────────────────────────────┘
                                                         │
                              ┌──────────────────────────▼────────────────────────────┐
                              │         Iceberg Parquet files (S3 / Azure ADLS)        │
                              │        managed by Snowflake external volumes            │
                              └───────────────────────────────────────────────────────┘

How It Works

1 — PAT Authentication

horizon-mcp authenticates to the Horizon IRC API using a Snowflake Personal Access Token (PAT). On startup (and before token expiry) the server exchanges the PAT for a short-lived OAuth bearer token:

POST /polaris/api/catalog/v1/oauth/tokens
  grant_type=client_credentials
  scope=session:role:<ROLE>
  client_secret=<PAT>
→ { "access_token": "...", "expires_in": 3600 }

The bearer token is cached in memory and refreshed automatically 120 seconds before expiry.

2 — Horizon IRC REST Catalog

All catalog discovery goes through Snowflake's Horizon Iceberg REST Catalog (IRC) API:

Operation HTTP call
List schemas GET /v1/{database}/namespaces
List tables in a schema GET /v1/{database}/namespaces/{namespace}/tables
Load table metadata + credentials GET /v1/{database}/namespaces/{namespace}/tables/{table}

Snowflake enforces your full RBAC posture on every call — the role baked into the OAuth scope determines exactly which namespaces and tables are visible.

3 — Vended Credentials

When load_table is called, Horizon returns the Iceberg table metadata plus ephemeral, scoped storage credentials (AWS STS tokens for S3, SAS tokens for Azure ADLS). These credentials:

  • Are scoped to the specific table being accessed
  • Expire after a short TTL (typically 1 hour)
  • Are never stored to disk — they live only in memory for the duration of the query

4 — DuckDB Query Engine

DuckDB reads Parquet/Iceberg files directly from cloud storage using the vended credentials. The server uses DuckDB's native iceberg extension with an OAuth secret:

CREATE SECRET horizon_secret (
    TYPE iceberg,
    CLIENT_SECRET '<PAT>',
    OAUTH2_SERVER_URI '...',
    OAUTH2_GRANT_TYPE 'client_credentials',
    OAUTH2_SCOPE 'session:role:<ROLE>'
)

ATTACH 'MYDB' AS mydb (
    TYPE iceberg,
    SECRET horizon_secret,
    ENDPOINT 'https://<account>.snowflakecomputing.com/polaris/api/catalog'
)

No Snowflake warehouse is started. No compute credits are charged for reads. Query compute runs entirely in DuckDB inside the MCP server process on your local machine.


Semantic Backends

horizon-mcp supports two backends for the discover_datasets tool, controlled by SEMANTIC_BACKEND in .env:

Backend How it works Best for
none (default) Live Horizon IRC browsing — fetches all namespaces and tables on every search, scores by keyword match against table and column names Demos, smaller catalogs (< 50 tables), always in sync with the catalog
lancedb Local vector search index built by horizon-seed, uses sentence-transformers for embeddings, cosine similarity search Large catalogs with 50+ tables, faster discovery, offline-capable

Note: With lancedb you must run horizon-seed after setup and re-run it whenever the catalog changes to keep the index fresh.


MCP Tools

Five tools are exposed to Claude Desktop:

list_namespaces

Lists all schemas (namespaces) visible to the current role.

"What schemas are in the catalog?"

describe_table(namespace, table)

Returns the full column schema, partition spec, current snapshot ID, metadata location, and an example SQL query for a specific table. Always fetches live from Horizon IRC.

"What columns does the ORDERS table have?"

discover_datasets(intent, max_results?)

Finds tables relevant to a natural language intent. Uses the configured semantic backend (live browse or LanceDB vector search). Always call this first before writing SQL.

"Find me data about customer refunds"

execute_sql(sql)

Executes raw DuckDB SQL against your Iceberg tables. Table references must be fully qualified:

SELECT * FROM horizonmcpdb.NORTHMART.ORDERS LIMIT 10

"Run this SQL: SELECT region, SUM(revenue) FROM ..."

ask_data(question, namespace?, table?)

Full NL → SQL → result pipeline. Discovers relevant tables, generates SQL via an LLM, executes via DuckDB, and returns results. Requires OPENAI_API_KEY in .env.

"What's the total revenue by region for this quarter?"


Prerequisites

  • OS: WSL2 Ubuntu 24.04 (or any Linux/macOS with Python 3.11+)
  • Python: 3.11 or later
  • Claude Desktop: Latest version with MCP support enabled
  • Snowflake account with:
    • Iceberg tables managed by Snowflake (not externally managed)
    • Polaris/Horizon IRC enabled on the account
    • An external volume configured for your Iceberg tables

Snowflake Setup

Run the following as ACCOUNTADMIN in a Snowflake worksheet:

-- ── 1. Create the MCP role ──────────────────────────────────────────────────
CREATE ROLE IF NOT EXISTS HORIZON_MCP_ROLE;

-- ── 2. Grant catalog access ─────────────────────────────────────────────────
-- Replace MYDB and MYSCHEMA with your actual database/schema names
GRANT USAGE ON DATABASE MYDB TO ROLE HORIZON_MCP_ROLE;
GRANT USAGE ON SCHEMA MYDB.MYSCHEMA TO ROLE HORIZON_MCP_ROLE;

-- ── 3. Grant read access to Iceberg tables ──────────────────────────────────
-- Grant on all existing tables in the schema
GRANT SELECT ON ALL TABLES IN SCHEMA MYDB.MYSCHEMA TO ROLE HORIZON_MCP_ROLE;
-- Grant on future tables automatically
GRANT SELECT ON FUTURE TABLES IN SCHEMA MYDB.MYSCHEMA TO ROLE HORIZON_MCP_ROLE;

-- ── 4. Grant access to the external volume ──────────────────────────────────
-- Required for Horizon to vend storage credentials
GRANT USAGE ON EXTERNAL VOLUME <your_external_volume_name> TO ROLE HORIZON_MCP_ROLE;

-- ── 5. Create a service user ────────────────────────────────────────────────
CREATE USER IF NOT EXISTS MCP_SERVICE_USER
    TYPE = SERVICE
    DEFAULT_ROLE = HORIZON_MCP_ROLE
    COMMENT = 'Service user for horizon-mcp MCP server';

GRANT ROLE HORIZON_MCP_ROLE TO USER MCP_SERVICE_USER;

-- ── 6. Create an authentication policy scoped to Horizon IRC ────────────────
CREATE AUTHENTICATION POLICY IF NOT EXISTS HORIZON_MCP_AUTH_POLICY
    AUTHENTICATION_METHODS = ('PROGRAMMATIC_ACCESS_TOKEN')
    COMMENT = 'Allow only PAT auth for horizon-mcp service user';

ALTER USER MCP_SERVICE_USER
    SET AUTHENTICATION POLICY HORIZON_MCP_AUTH_POLICY;

-- ── 7. Generate a PAT ───────────────────────────────────────────────────────
-- Log in as MCP_SERVICE_USER and run:
ALTER USER MCP_SERVICE_USER ADD PROGRAMMATIC ACCESS TOKEN HORIZON_MCP_PAT
    ROLE_RESTRICTION = HORIZON_MCP_ROLE
    DAYS_TO_EXPIRY = 31
    COMMENT = 'PAT for horizon-mcp MCP server';

-- Copy the token value — it is shown only once.

Tip: Set a calendar reminder to rotate the PAT before DAYS_TO_EXPIRY lapses. Update SNOWFLAKE_PAT in .env and restart the MCP server.


Installation

# 1. Clone the repository
git clone https://github.com/kshash919/snowflake-horizon-irc-mcp.git
cd snowflake-horizon-irc-mcp

# 2. Create and activate a Python virtual environment
python3 -m venv .venv
source .venv/bin/activate   # Windows WSL2: same command

# 3. Install the package and all dependencies
pip install -e .

# 4. Configure credentials
cp .env.example .env
# Edit .env and fill in all values (see .env.example for field descriptions)

# 5. Verify the connection
python scripts/test_connection.py

A successful run looks like:

── Horizon IRC MCP Connection Test ─────────────────────

  OK  Config loaded
       account  = MYACCOUNT-MYLOCATOR
       database = MYDB
       role     = HORIZON_MCP_ROLE
       backend  = none
  OK  PAT -> bearer token  (length=640)
  OK  Namespaces: ['NORTHMART', 'ANALYTICS']
  OK  Tables in 'NORTHMART': ['ORDERS', 'CUSTOMERS', 'PRODUCTS']
  OK  Schema of NORTHMART.ORDERS: ['order_id', 'customer_id', 'order_date', ...]
  OK  Vended credentials present  keys=['s3.access-key-id', 's3.secret-access-key', ...]
  OK  DuckDB Iceberg secret configured
  OK  Semantic backend: none (live Horizon IRC browsing)

── All checks passed ────────────────────────────────────

Claude Desktop Configuration

Add the following to your Claude Desktop config file:

macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows (WSL2): %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "horizon-iceberg": {
      "command": "wsl",
      "args": [
        "--",
        "/home/<your-wsl-username>/openclaw-workspace/mcp-servers/snowflake/horizon-mcp/.venv/bin/python",
        "-m",
        "horizon_mcp.server"
      ],
      "cwd": "/home/<your-wsl-username>/openclaw-workspace/mcp-servers/snowflake/horizon-mcp"
    }
  }
}

WSL2 note: The "command": "wsl" entry tells Claude Desktop to launch the server inside WSL2. The path to python must be the full absolute path to the venv interpreter. Replace <your-wsl-username> with your actual WSL username (run whoami in WSL to check).

After editing the config, fully quit and relaunch Claude Desktop. The horizon-iceberg tool should appear in the MCP tools panel.


Example Queries

Once connected, try these in Claude Desktop:

Catalog Discovery

What data do you have access to?
List all the schemas and tables in the catalog.

Simple Queries

Show me the first 5 orders.
How many customers are there per country?
What are the 10 most expensive products?

Analytical

What is total revenue by region for the last 12 months?
What's the refund rate by product category?
Which day of the week has the highest order volume?

Multi-Table Joins

Which customer segment has the highest refund rate?
What's the average order value per customer cohort (by signup month)?

Raw SQL via execute_sql

Run this SQL:
SELECT
    customer_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS lifetime_value,
    RANK() OVER (ORDER BY SUM(total_amount) DESC) AS ltv_rank
FROM horizonmcpdb.northmart.orders
GROUP BY customer_id
ORDER BY ltv_rank
LIMIT 20

The Governance Question

What credentials are you using to read this data? Are the storage keys long-lived?

Claude will explain the PAT → OAuth → vended ephemeral credentials chain accurately, because the MCP server instructions describe it.


Switching Semantic Backends

Edit SEMANTIC_BACKEND in your .env:

# Live Horizon IRC browsing (default, no index needed)
SEMANTIC_BACKEND=none

# Local vector search (faster for large catalogs)
SEMANTIC_BACKEND=lancedb

When switching to lancedb, build the index first:

source .venv/bin/activate
horizon-seed

horizon-seed crawls all namespaces and tables visible to your role, generates sentence embeddings for each table (using all-MiniLM-L6-v2 locally — no API key needed), and stores them in a local LanceDB database at ~/.horizon-mcp/lancedb/.

Re-run horizon-seed after schema changes or when new tables are added.

none lancedb
Index required No Yes (horizon-seed)
Always in sync ❌ (re-seed required)
Works offline ✅ (after seeding)
Best for < 50 tables, demos 50+ tables, production
Extra dependencies None ~500 MB (torch, transformers)

Troubleshooting

HTTP 401 — Token exchange failed

The PAT has expired or is invalid.

-- Regenerate in Snowflake as MCP_SERVICE_USER
ALTER USER MCP_SERVICE_USER ADD PROGRAMMATIC ACCESS TOKEN HORIZON_MCP_PAT
    ROLE_RESTRICTION = HORIZON_MCP_ROLE
    DAYS_TO_EXPIRY = 31;

Update SNOWFLAKE_PAT in .env and rerun python scripts/test_connection.py.


HTTP 403 — Missing role grants

HORIZON_MCP_ROLE is missing USAGE or SELECT on the relevant object.

-- Check grants
SHOW GRANTS TO ROLE HORIZON_MCP_ROLE;

-- Fix missing grants
GRANT USAGE ON DATABASE MYDB TO ROLE HORIZON_MCP_ROLE;
GRANT USAGE ON SCHEMA MYDB.MYSCHEMA TO ROLE HORIZON_MCP_ROLE;
GRANT SELECT ON ALL TABLES IN SCHEMA MYDB.MYSCHEMA TO ROLE HORIZON_MCP_ROLE;

HTTP 404 — Unable to find matching target resource method

The account identifier format in SNOWFLAKE_ACCOUNT is wrong. It must be in ORGNAME-ACCOUNTNAME format, not the legacy ACCOUNT.REGION.cloud format.

# Wrong
SNOWFLAKE_ACCOUNT=myaccount.us-east-1.aws

# Correct
SNOWFLAKE_ACCOUNT=MYORG-MYACCOUNT

Find your org + account name in Snowflake: Admin → Accounts.


NoSuchNamespaceException: Namespace does not exist: ""

Horizon IRC returned an empty namespace. This usually means:

  1. HORIZON_MCP_ROLE has no USAGE on the database or external volume — check grants above.
  2. The catalog name in SNOWFLAKE_DATABASE doesn't match the Polaris catalog name exactly (case-sensitive).
  3. Polaris/Horizon IRC is not fully enabled for the account — contact Snowflake support.

MCP server not showing up in Claude Desktop

  1. Validate the JSON syntax of claude_desktop_config.json — a trailing comma or missing brace will silently prevent the server from loading. Use a JSON linter.
  2. Fully quit Claude Desktop (not just close the window) and relaunch.
  3. Check Claude Desktop logs for MCP errors:
    • macOS: ~/Library/Logs/Claude/mcp*.log
    • Windows: %APPDATA%\Claude\logs\mcp*.log
  4. Test the server manually from the WSL terminal:
    cd ~/openclaw-workspace/mcp-servers/snowflake/horizon-mcp
    source .venv/bin/activate
    python -m horizon_mcp.server
    
    It should start without errors (press Ctrl+C to stop).

Empty columns array in describe_table

Known limitation: Some versions of the Horizon IRC load_table response nest the Iceberg schema differently. horizon-mcp looks for metadata.schema.fields — if Snowflake returns the schema under a different key in your account version, the columns list will be empty. The table is still queryable via DuckDB; use DESCRIBE <table> in execute_sql as a workaround:

DESCRIBE SELECT * FROM horizonmcpdb.MYSCHEMA.MYTABLE LIMIT 0;

Project Structure

horizon-mcp/
├── pyproject.toml                  # Package metadata and dependencies
├── .env.example                    # Environment variable template
├── .gitignore                      # Excludes .env, .venv, __pycache__
├── README.md                       # This file
│
├── scripts/
│   └── test_connection.py          # End-to-end smoke test (run before first use)
│
└── src/
    └── horizon_mcp/
        ├── __init__.py             # Package init
        ├── config.py               # Loads and validates .env into a Config dataclass
        ├── horizon_client.py       # Horizon IRC REST API client (auth + catalog calls)
        ├── duckdb_engine.py        # DuckDB + Iceberg extension query engine
        ├── server.py               # MCP server — 5 tools, FastMCP transport
        ├── seed.py                 # horizon-seed CLI — builds LanceDB index
        ├── nl_to_sql.py            # NL → SQL via OpenAI-compatible API
        ├── semantic.py             # Backend factory (returns none or lancedb backend)
        └── semantic/
            ├── __init__.py         # Semantic subpackage init
            ├── base.py             # Abstract SemanticBackend base class
            ├── none_backend.py     # Live IRC browse + keyword scoring
            └── lancedb_backend.py  # LanceDB vector search backend

Billing Note

Horizon IRC API calls will be billed as Cloud Services at 0.5 credits per million API calls — negligible for typical MCP usage patterns. No Snowflake warehouse compute credits are charged for data reads; DuckDB reads Parquet files directly from cloud storage.


Security

Concern How it's handled
Credentials never exposed through MCP The MCP tools return query results only — PAT, bearer tokens, and vended storage keys are never included in tool responses
.env is gitignored .gitignore excludes .env — credentials are never committed to source control
PAT scoped to a single role ROLE_RESTRICTION = HORIZON_MCP_ROLE in the PAT definition prevents privilege escalation
Vended credentials are ephemeral Horizon-vended STS/SAS tokens expire after ~1 hour and are scoped to the specific table being queried
No long-lived storage keys DuckDB never holds permanent cloud storage credentials — only ephemeral tokens vended per-query

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured