horizon-mcp
Enables natural language querying of Snowflake-managed Iceberg tables via Horizon IRC and DuckDB, without using a Snowflake warehouse.
README
horizon-mcp
Natural language access to Snowflake-managed Iceberg tables via Horizon IRC, using DuckDB as the query engine.
Query your Snowflake Iceberg catalog through Claude Desktop without spinning up a Snowflake warehouse — no compute credits, full RBAC enforcement, vended credentials only.
Architecture
┌─────────────────┐ stdio / MCP ┌──────────────────────┐
│ Claude Desktop │ ◄────────────────────── ► │ horizon-mcp │
└─────────────────┘ │ (this server) │
└──────────┬───────────┘
│
┌──────────────────────────▼────────────────────────────┐
│ Snowflake Horizon IRC REST Catalog │
│ GET /v1/{database}/namespaces │
│ GET /v1/{database}/namespaces/{ns}/tables │
│ GET /v1/{database}/namespaces/{ns}/tables/{table} │
└──────────────────────────┬────────────────────────────┘
│
Vended credentials
(ephemeral STS / SAS tokens)
│
┌──────────────────────────▼────────────────────────────┐
│ DuckDB │
│ ATTACH via Iceberg extension + OAuth secret │
│ Reads Parquet files directly from cloud storage │
└──────────────────────────┬────────────────────────────┘
│
┌──────────────────────────▼────────────────────────────┐
│ Iceberg Parquet files (S3 / Azure ADLS) │
│ managed by Snowflake external volumes │
└───────────────────────────────────────────────────────┘
How It Works
1 — PAT Authentication
horizon-mcp authenticates to the Horizon IRC API using a Snowflake Personal Access Token (PAT). On startup (and before token expiry) the server exchanges the PAT for a short-lived OAuth bearer token:
POST /polaris/api/catalog/v1/oauth/tokens
grant_type=client_credentials
scope=session:role:<ROLE>
client_secret=<PAT>
→ { "access_token": "...", "expires_in": 3600 }
The bearer token is cached in memory and refreshed automatically 120 seconds before expiry.
2 — Horizon IRC REST Catalog
All catalog discovery goes through Snowflake's Horizon Iceberg REST Catalog (IRC) API:
| Operation | HTTP call |
|---|---|
| List schemas | GET /v1/{database}/namespaces |
| List tables in a schema | GET /v1/{database}/namespaces/{namespace}/tables |
| Load table metadata + credentials | GET /v1/{database}/namespaces/{namespace}/tables/{table} |
Snowflake enforces your full RBAC posture on every call — the role baked into the OAuth scope determines exactly which namespaces and tables are visible.
3 — Vended Credentials
When load_table is called, Horizon returns the Iceberg table metadata plus ephemeral, scoped storage credentials (AWS STS tokens for S3, SAS tokens for Azure ADLS). These credentials:
- Are scoped to the specific table being accessed
- Expire after a short TTL (typically 1 hour)
- Are never stored to disk — they live only in memory for the duration of the query
4 — DuckDB Query Engine
DuckDB reads Parquet/Iceberg files directly from cloud storage using the vended credentials. The server uses DuckDB's native iceberg extension with an OAuth secret:
CREATE SECRET horizon_secret (
TYPE iceberg,
CLIENT_SECRET '<PAT>',
OAUTH2_SERVER_URI '...',
OAUTH2_GRANT_TYPE 'client_credentials',
OAUTH2_SCOPE 'session:role:<ROLE>'
)
ATTACH 'MYDB' AS mydb (
TYPE iceberg,
SECRET horizon_secret,
ENDPOINT 'https://<account>.snowflakecomputing.com/polaris/api/catalog'
)
No Snowflake warehouse is started. No compute credits are charged for reads. Query compute runs entirely in DuckDB inside the MCP server process on your local machine.
Semantic Backends
horizon-mcp supports two backends for the discover_datasets tool, controlled by SEMANTIC_BACKEND in .env:
| Backend | How it works | Best for |
|---|---|---|
none (default) |
Live Horizon IRC browsing — fetches all namespaces and tables on every search, scores by keyword match against table and column names | Demos, smaller catalogs (< 50 tables), always in sync with the catalog |
lancedb |
Local vector search index built by horizon-seed, uses sentence-transformers for embeddings, cosine similarity search |
Large catalogs with 50+ tables, faster discovery, offline-capable |
Note: With
lancedbyou must runhorizon-seedafter setup and re-run it whenever the catalog changes to keep the index fresh.
MCP Tools
Five tools are exposed to Claude Desktop:
list_namespaces
Lists all schemas (namespaces) visible to the current role.
"What schemas are in the catalog?"
describe_table(namespace, table)
Returns the full column schema, partition spec, current snapshot ID, metadata location, and an example SQL query for a specific table. Always fetches live from Horizon IRC.
"What columns does the ORDERS table have?"
discover_datasets(intent, max_results?)
Finds tables relevant to a natural language intent. Uses the configured semantic backend (live browse or LanceDB vector search). Always call this first before writing SQL.
"Find me data about customer refunds"
execute_sql(sql)
Executes raw DuckDB SQL against your Iceberg tables. Table references must be fully qualified:
SELECT * FROM horizonmcpdb.NORTHMART.ORDERS LIMIT 10
"Run this SQL: SELECT region, SUM(revenue) FROM ..."
ask_data(question, namespace?, table?)
Full NL → SQL → result pipeline. Discovers relevant tables, generates SQL via an LLM, executes via DuckDB, and returns results. Requires OPENAI_API_KEY in .env.
"What's the total revenue by region for this quarter?"
Prerequisites
- OS: WSL2 Ubuntu 24.04 (or any Linux/macOS with Python 3.11+)
- Python: 3.11 or later
- Claude Desktop: Latest version with MCP support enabled
- Snowflake account with:
- Iceberg tables managed by Snowflake (not externally managed)
- Polaris/Horizon IRC enabled on the account
- An external volume configured for your Iceberg tables
Snowflake Setup
Run the following as ACCOUNTADMIN in a Snowflake worksheet:
-- ── 1. Create the MCP role ──────────────────────────────────────────────────
CREATE ROLE IF NOT EXISTS HORIZON_MCP_ROLE;
-- ── 2. Grant catalog access ─────────────────────────────────────────────────
-- Replace MYDB and MYSCHEMA with your actual database/schema names
GRANT USAGE ON DATABASE MYDB TO ROLE HORIZON_MCP_ROLE;
GRANT USAGE ON SCHEMA MYDB.MYSCHEMA TO ROLE HORIZON_MCP_ROLE;
-- ── 3. Grant read access to Iceberg tables ──────────────────────────────────
-- Grant on all existing tables in the schema
GRANT SELECT ON ALL TABLES IN SCHEMA MYDB.MYSCHEMA TO ROLE HORIZON_MCP_ROLE;
-- Grant on future tables automatically
GRANT SELECT ON FUTURE TABLES IN SCHEMA MYDB.MYSCHEMA TO ROLE HORIZON_MCP_ROLE;
-- ── 4. Grant access to the external volume ──────────────────────────────────
-- Required for Horizon to vend storage credentials
GRANT USAGE ON EXTERNAL VOLUME <your_external_volume_name> TO ROLE HORIZON_MCP_ROLE;
-- ── 5. Create a service user ────────────────────────────────────────────────
CREATE USER IF NOT EXISTS MCP_SERVICE_USER
TYPE = SERVICE
DEFAULT_ROLE = HORIZON_MCP_ROLE
COMMENT = 'Service user for horizon-mcp MCP server';
GRANT ROLE HORIZON_MCP_ROLE TO USER MCP_SERVICE_USER;
-- ── 6. Create an authentication policy scoped to Horizon IRC ────────────────
CREATE AUTHENTICATION POLICY IF NOT EXISTS HORIZON_MCP_AUTH_POLICY
AUTHENTICATION_METHODS = ('PROGRAMMATIC_ACCESS_TOKEN')
COMMENT = 'Allow only PAT auth for horizon-mcp service user';
ALTER USER MCP_SERVICE_USER
SET AUTHENTICATION POLICY HORIZON_MCP_AUTH_POLICY;
-- ── 7. Generate a PAT ───────────────────────────────────────────────────────
-- Log in as MCP_SERVICE_USER and run:
ALTER USER MCP_SERVICE_USER ADD PROGRAMMATIC ACCESS TOKEN HORIZON_MCP_PAT
ROLE_RESTRICTION = HORIZON_MCP_ROLE
DAYS_TO_EXPIRY = 31
COMMENT = 'PAT for horizon-mcp MCP server';
-- Copy the token value — it is shown only once.
Tip: Set a calendar reminder to rotate the PAT before
DAYS_TO_EXPIRYlapses. UpdateSNOWFLAKE_PATin.envand restart the MCP server.
Installation
# 1. Clone the repository
git clone https://github.com/kshash919/snowflake-horizon-irc-mcp.git
cd snowflake-horizon-irc-mcp
# 2. Create and activate a Python virtual environment
python3 -m venv .venv
source .venv/bin/activate # Windows WSL2: same command
# 3. Install the package and all dependencies
pip install -e .
# 4. Configure credentials
cp .env.example .env
# Edit .env and fill in all values (see .env.example for field descriptions)
# 5. Verify the connection
python scripts/test_connection.py
A successful run looks like:
── Horizon IRC MCP Connection Test ─────────────────────
OK Config loaded
account = MYACCOUNT-MYLOCATOR
database = MYDB
role = HORIZON_MCP_ROLE
backend = none
OK PAT -> bearer token (length=640)
OK Namespaces: ['NORTHMART', 'ANALYTICS']
OK Tables in 'NORTHMART': ['ORDERS', 'CUSTOMERS', 'PRODUCTS']
OK Schema of NORTHMART.ORDERS: ['order_id', 'customer_id', 'order_date', ...]
OK Vended credentials present keys=['s3.access-key-id', 's3.secret-access-key', ...]
OK DuckDB Iceberg secret configured
OK Semantic backend: none (live Horizon IRC browsing)
── All checks passed ────────────────────────────────────
Claude Desktop Configuration
Add the following to your Claude Desktop config file:
macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
Windows (WSL2): %APPDATA%\Claude\claude_desktop_config.json
{
"mcpServers": {
"horizon-iceberg": {
"command": "wsl",
"args": [
"--",
"/home/<your-wsl-username>/openclaw-workspace/mcp-servers/snowflake/horizon-mcp/.venv/bin/python",
"-m",
"horizon_mcp.server"
],
"cwd": "/home/<your-wsl-username>/openclaw-workspace/mcp-servers/snowflake/horizon-mcp"
}
}
}
WSL2 note: The
"command": "wsl"entry tells Claude Desktop to launch the server inside WSL2. The path topythonmust be the full absolute path to the venv interpreter. Replace<your-wsl-username>with your actual WSL username (runwhoamiin WSL to check).
After editing the config, fully quit and relaunch Claude Desktop. The horizon-iceberg tool should appear in the MCP tools panel.
Example Queries
Once connected, try these in Claude Desktop:
Catalog Discovery
What data do you have access to?
List all the schemas and tables in the catalog.
Simple Queries
Show me the first 5 orders.
How many customers are there per country?
What are the 10 most expensive products?
Analytical
What is total revenue by region for the last 12 months?
What's the refund rate by product category?
Which day of the week has the highest order volume?
Multi-Table Joins
Which customer segment has the highest refund rate?
What's the average order value per customer cohort (by signup month)?
Raw SQL via execute_sql
Run this SQL:
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(total_amount) AS lifetime_value,
RANK() OVER (ORDER BY SUM(total_amount) DESC) AS ltv_rank
FROM horizonmcpdb.northmart.orders
GROUP BY customer_id
ORDER BY ltv_rank
LIMIT 20
The Governance Question
What credentials are you using to read this data? Are the storage keys long-lived?
Claude will explain the PAT → OAuth → vended ephemeral credentials chain accurately, because the MCP server instructions describe it.
Switching Semantic Backends
Edit SEMANTIC_BACKEND in your .env:
# Live Horizon IRC browsing (default, no index needed)
SEMANTIC_BACKEND=none
# Local vector search (faster for large catalogs)
SEMANTIC_BACKEND=lancedb
When switching to lancedb, build the index first:
source .venv/bin/activate
horizon-seed
horizon-seed crawls all namespaces and tables visible to your role, generates sentence embeddings for each table (using all-MiniLM-L6-v2 locally — no API key needed), and stores them in a local LanceDB database at ~/.horizon-mcp/lancedb/.
Re-run horizon-seed after schema changes or when new tables are added.
none |
lancedb |
|
|---|---|---|
| Index required | No | Yes (horizon-seed) |
| Always in sync | ✅ | ❌ (re-seed required) |
| Works offline | ❌ | ✅ (after seeding) |
| Best for | < 50 tables, demos | 50+ tables, production |
| Extra dependencies | None | ~500 MB (torch, transformers) |
Troubleshooting
HTTP 401 — Token exchange failed
The PAT has expired or is invalid.
-- Regenerate in Snowflake as MCP_SERVICE_USER
ALTER USER MCP_SERVICE_USER ADD PROGRAMMATIC ACCESS TOKEN HORIZON_MCP_PAT
ROLE_RESTRICTION = HORIZON_MCP_ROLE
DAYS_TO_EXPIRY = 31;
Update SNOWFLAKE_PAT in .env and rerun python scripts/test_connection.py.
HTTP 403 — Missing role grants
HORIZON_MCP_ROLE is missing USAGE or SELECT on the relevant object.
-- Check grants
SHOW GRANTS TO ROLE HORIZON_MCP_ROLE;
-- Fix missing grants
GRANT USAGE ON DATABASE MYDB TO ROLE HORIZON_MCP_ROLE;
GRANT USAGE ON SCHEMA MYDB.MYSCHEMA TO ROLE HORIZON_MCP_ROLE;
GRANT SELECT ON ALL TABLES IN SCHEMA MYDB.MYSCHEMA TO ROLE HORIZON_MCP_ROLE;
HTTP 404 — Unable to find matching target resource method
The account identifier format in SNOWFLAKE_ACCOUNT is wrong. It must be in ORGNAME-ACCOUNTNAME format, not the legacy ACCOUNT.REGION.cloud format.
# Wrong
SNOWFLAKE_ACCOUNT=myaccount.us-east-1.aws
# Correct
SNOWFLAKE_ACCOUNT=MYORG-MYACCOUNT
Find your org + account name in Snowflake: Admin → Accounts.
NoSuchNamespaceException: Namespace does not exist: ""
Horizon IRC returned an empty namespace. This usually means:
HORIZON_MCP_ROLEhas noUSAGEon the database or external volume — check grants above.- The catalog name in
SNOWFLAKE_DATABASEdoesn't match the Polaris catalog name exactly (case-sensitive). - Polaris/Horizon IRC is not fully enabled for the account — contact Snowflake support.
MCP server not showing up in Claude Desktop
- Validate the JSON syntax of
claude_desktop_config.json— a trailing comma or missing brace will silently prevent the server from loading. Use a JSON linter. - Fully quit Claude Desktop (not just close the window) and relaunch.
- Check Claude Desktop logs for MCP errors:
- macOS:
~/Library/Logs/Claude/mcp*.log - Windows:
%APPDATA%\Claude\logs\mcp*.log
- macOS:
- Test the server manually from the WSL terminal:
It should start without errors (press Ctrl+C to stop).cd ~/openclaw-workspace/mcp-servers/snowflake/horizon-mcp source .venv/bin/activate python -m horizon_mcp.server
Empty columns array in describe_table
Known limitation: Some versions of the Horizon IRC load_table response nest the Iceberg schema differently. horizon-mcp looks for metadata.schema.fields — if Snowflake returns the schema under a different key in your account version, the columns list will be empty. The table is still queryable via DuckDB; use DESCRIBE <table> in execute_sql as a workaround:
DESCRIBE SELECT * FROM horizonmcpdb.MYSCHEMA.MYTABLE LIMIT 0;
Project Structure
horizon-mcp/
├── pyproject.toml # Package metadata and dependencies
├── .env.example # Environment variable template
├── .gitignore # Excludes .env, .venv, __pycache__
├── README.md # This file
│
├── scripts/
│ └── test_connection.py # End-to-end smoke test (run before first use)
│
└── src/
└── horizon_mcp/
├── __init__.py # Package init
├── config.py # Loads and validates .env into a Config dataclass
├── horizon_client.py # Horizon IRC REST API client (auth + catalog calls)
├── duckdb_engine.py # DuckDB + Iceberg extension query engine
├── server.py # MCP server — 5 tools, FastMCP transport
├── seed.py # horizon-seed CLI — builds LanceDB index
├── nl_to_sql.py # NL → SQL via OpenAI-compatible API
├── semantic.py # Backend factory (returns none or lancedb backend)
└── semantic/
├── __init__.py # Semantic subpackage init
├── base.py # Abstract SemanticBackend base class
├── none_backend.py # Live IRC browse + keyword scoring
└── lancedb_backend.py # LanceDB vector search backend
Billing Note
Horizon IRC API calls will be billed as Cloud Services at 0.5 credits per million API calls — negligible for typical MCP usage patterns. No Snowflake warehouse compute credits are charged for data reads; DuckDB reads Parquet files directly from cloud storage.
Security
| Concern | How it's handled |
|---|---|
| Credentials never exposed through MCP | The MCP tools return query results only — PAT, bearer tokens, and vended storage keys are never included in tool responses |
.env is gitignored |
.gitignore excludes .env — credentials are never committed to source control |
| PAT scoped to a single role | ROLE_RESTRICTION = HORIZON_MCP_ROLE in the PAT definition prevents privilege escalation |
| Vended credentials are ephemeral | Horizon-vended STS/SAS tokens expire after ~1 hour and are scoped to the specific table being queried |
| No long-lived storage keys | DuckDB never holds permanent cloud storage credentials — only ephemeral tokens vended per-query |
License
MIT
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.