MCP Servers

ODPS MCP Server

Read-only MCP server for Alibaba Cloud MaxCompute, enabling safe querying of allowlisted tables via tools like list_allowed_tables, describe_table, preview_table, and run_select_sql.

README

ODPS MCP Server

Read-only MCP server for Alibaba Cloud MaxCompute with enterprise-grade security.

Features

Read-only SQL Guard: Only SELECT queries are allowed, with automatic LIMIT injection
Table/Column-level Access Control: Fine-grained policy via YAML allowlist
Policy Registry: Domain-based policy management with sensitivity levels, row policies, and partition filters
Authentication: Bearer Token and Gateway Header authentication
Audit Logging: JSONL-format audit logs with request tracking
Unified Response Structure: Consistent ok/data/error/metadata/audit_id format
Join Path Catalog: Pre-approved table join relationships with business context

Quick Start

1. Install Dependencies

python -m venv .venv
.venv\Scripts\activate  # Windows
# source .venv/bin/activate  # Linux/Mac
pip install -e .

2. Configure Environment

# Copy example config
copy .env.example .env.local  # Windows
# cp .env.example .env.local  # Linux/Mac

# Edit .env.local with your ODPS credentials

Required variables:

ALIBABA_CLOUD_ACCESS_KEY_ID
ALIBABA_CLOUD_ACCESS_KEY_SECRET
ODPS_ENDPOINT
ODPS_PROJECT
ODPS_ALLOWLIST_PATH

3. Run Server

# Streamable HTTP mode (recommended)
python -m odps_mcp_server.server

# Or using uvicorn
uvicorn odps_mcp_server.server:create_app --factory --host 0.0.0.0 --port 8000

Endpoints

Endpoint	Method	Description
`/mcp`	POST	Standard MCP Streamable HTTP endpoint
`/`	POST	Compatibility endpoint (redirects to `/mcp`)
`/health`	GET	Health check

MCP Tools

Discovery Tools

Tool	Description
`list_allowed_tables`	List tables accessible by the current policy
`describe_table`	Get table schema, columns, and partition info
`get_join_paths`	Get pre-approved join relationships between tables

Query Tools

Tool	Description
`preview_table`	Preview rows from a table with automatic partition pruning
`run_select_sql`	Execute a read-only SELECT query

Semantic Layer Tools (P2)

Tool	Description
`search_business_terms`	Search business terms by keyword
`get_metric_definition`	Get metric definition, expression, and dimensions
`list_metrics`	List available metrics, optionally by domain
`get_dimensions`	Get available dimensions for a metric
`generate_metric_sql`	Generate SQL for a metric with dimensions and filters
`validate_metric_query`	Validate a metric query before execution
`list_domains`	List all business domains

Query Planner Tools (P3)

Tool	Description
`plan_query`	Convert natural language question to query plan
`explain_query_plan`	Get detailed explanation of a query plan
`run_approved_query`	Execute an approved query plan

The Query Planner enables agents to query data without writing SQL directly:

plan_query("3月华南区发货金额是多少？") → Returns a query plan with metrics, dimensions, filters
explain_query_plan(plan_id) → Get human-readable explanation
run_approved_query(plan_id) → Execute the plan and get results

This approach ensures:

LLM understands the question and generates a candidate plan
Program handles permissions, table selection, column selection, join checks, limits
Risk flags are identified before execution

Cost Control (P4)

The server includes built-in cost control mechanisms to prevent expensive queries:

Risk Assessment

Every query is evaluated for risk based on:

Missing WHERE clause
Missing partition filter
Missing LIMIT
Large LIMIT values
High JOIN count
SELECT * usage

Risk levels: low → medium → high → critical

Policy Enforcement

Configured in config/policy.yaml under security:

security:
  deny_unapproved_join: true
  max_join_tables: 3
  max_estimated_scan_bytes: 10737418240  # 10GB
  query_timeout_seconds: 300
  max_concurrent_queries_per_user: 5

Response with Cost Estimate

When cost control is enabled, responses include:

{
  "ok": true,
  "data": { ... },
  "cost_estimate": {
    "policy_decision": "allowed",
    "risk_level": "low",
    "risk_flags": [],
    "tables_used": ["project.table"],
    "columns_used": ["col1", "col2"],
    "partitions_used": ["pt"],
    "has_partition_filter": true,
    "join_count": 0
  }
}

Concurrency Control

Per-user concurrent query limits are enforced. Queries exceeding the limit receive a CONCURRENCY_LIMIT_EXCEEDED error.

Enterprise Integration (P5)

Supports enterprise identity providers and role-based access control.

Identity Providers

Configure in config/enterprise.yaml:

identity_providers:
  wecom:
    name: 企业微信
    type: wecom
    enabled: true
    corp_id: "your-corp-id"
    user_id_field: "X-Wecom-User-Id"

  dingtalk:
    name: 钉钉
    type: dingtalk
    enabled: true
    app_key: "your-app-key"
    user_id_field: "X-Dingtalk-User-Id"

Role-Based Access Control

roles:
  data_analyst:
    name: 数据分析师
    allowed_tools:
      - list_allowed_tables
      - describe_table
      - preview_table
      - run_select_sql
    allowed_domains:
      - sales
      - product
    rate_limit: "100/minute"
    query_quota_daily: 1000

Client Configuration

clients:
  sales_bot:
    name: 销售数据机器人
    client_id: sales_bot
    allowed_tools:
      - list_allowed_tables
      - describe_table
    allowed_domains:
      - sales
    rate_limit: "50/minute"

Permission Checks

Tool permission: user + client must both allow the tool
Domain permission: user + client must both allow the domain
Rate limit: stricter limit between user and client applies

Evaluation Framework (P6)

Golden dataset for evaluating Agent query accuracy.

Evaluation Dimensions

Dimension	Weight	Description
Table Selection	20%	Correct tables identified
Column Selection	15%	Correct columns/dimensions
Metric Accuracy	20%	Correct metrics calculated
Join Path	10%	Correct join relationships
Partition Filter	15%	Partition predicates present
Permission Check	10%	Access control enforced
SQL Executable	5%	Valid SQL generated
Answer Traceable	5%	SQL evidence provided

Golden Dataset

Located at config/evaluation/golden_dataset.yaml:

test_cases:
  - id: sales_001
    question: "2026年3月华南区发货金额是多少？"
    domain: sales
    expected_tables:
      - saky_dw_ods.ods_sap_delivery_order_details_df
    expected_metrics:
      - delivery_amount
    required_filters:
      - field: region_code
        value: "华南"
    forbidden_behavior:
      - full_table_scan
    should_be_blocked: false

Running Evaluation

Use the Evaluator service to run evaluations:

from odps_mcp_server.evaluator import Evaluator, load_evaluation_config

config = load_evaluation_config("config/evaluation/golden_dataset.yaml")
evaluator = Evaluator(config)

# Evaluate a plan
result = evaluator.evaluate_plan(
    test_id="sales_001",
    plan={"metrics": ["delivery_amount"], "tables": [...]},
    tables_used=["saky_dw_ods.ods_sap_delivery_order_details_df"],
    sql="SELECT ... LIMIT 100",
)

# Generate report
report = evaluator.generate_report()
print(f"Pass rate: {report.pass_rate:.2%}")

Evaluation Report

{
  "summary": {
    "total_cases": 11,
    "passed_cases": 9,
    "failed_cases": 2,
    "pass_rate": 0.8182
  },
  "dimension_scores": {
    "table_selection": 0.95,
    "metric_accuracy": 0.90,
    "partition_filter": 0.85,
    "permission_check": 1.0
  },
  "critical_failures": ["security_001", "cross_001"]
}

MCP Cockpit (P7)

Web-based admin console for managing the MCP Server.

Access

When running the server, access the cockpit at:

http://localhost:8000/admin

Features

Page	Description
Dashboard	Server status, component health, quick stats
Domains	Business domain management
Tables	Table permissions and metadata
Metrics	Metric definitions and configurations
Joins	Join path management (approved/candidate)
Audit	Query audit logs
Evaluation	Golden dataset test cases

Admin API

Endpoint	Method	Description
`/admin/api/status`	GET	Server status
`/admin/api/domains`	GET	List domains
`/admin/api/tables`	GET	List tables
`/admin/api/metrics`	GET	List metrics
`/admin/api/joins`	GET	List join paths
`/admin/api/audit`	GET	Audit logs
`/admin/api/evaluation/cases`	GET	Evaluation test cases

Screenshots

The cockpit provides:

Dashboard: Real-time server status, component health, query statistics
Domain Management: View business domains, owners, and table/metric counts
Table Management: Browse table permissions, sensitivity levels, column restrictions
Metric Management: View metric definitions, expressions, dimensions
Join Management: Review approved and candidate join paths
Audit: Search and filter query audit logs
Evaluation: View and run golden dataset test cases

Tool Response Structure

All tools return a unified response format:

{
  "ok": true,
  "data": { ... },
  "error": null,
  "metadata": {
    "request_id": "uuid",
    "tool_name": "describe_table",
    "timestamp": "2026-05-07T15:30:00Z",
    "latency_ms": 45.2,
    "tables_used": ["saky_dw_cdm.dim_pub_product_df"],
    "columns_used": ["item_code", "standard_name"],
    "row_count": 1,
    "truncated": false
  },
  "audit_id": "uuid"
}

Authentication

Bearer Token

Include the token in the Authorization header:

curl -H "Authorization: Bearer agent-sales-001" \
  http://localhost:8000/mcp \
  -d '{"tool": "list_allowed_tables"}'

Gateway Header

When behind an API gateway, the gateway can inject identity headers:

curl -H "X-User-Id: sales_agent" \
  -H "X-Client-Id: sales_bot" \
  -H "X-Session-Id: session-xxx" \
  http://localhost:8000/mcp \
  -d '{"tool": "list_allowed_tables"}'

Token Configuration

Define tokens in config/auth.yaml:

tokens:
  agent-sales-001:
    user_id: sales_agent
    client_id: sales_bot
    allowed_tools:
      - list_allowed_tables
      - describe_table
      - preview_table
    allowed_projects:
      - saky_dw_ods
      - saky_dw_cdm
    rate_limit: "100/minute"

Policy Configuration

Policy Registry (`config/policy.yaml`) - Recommended

The Policy Registry provides enterprise-grade policy management with:

Domain-based organization: Group tables by business domain (sales, supply_chain, product)
Sensitivity levels: public, internal, confidential, restricted
Column-level control: allowed_columns and denied_columns
Row-level policies: Role-based row filtering
Partition filter requirements: Enforce partition predicates
Per-table limits: Override global limit per table

domains:
  sales:
    name: 销售域
    owner: 销售运营部
    default_project: saky_dw_ods

tables:
  saky_dw_ods.ods_sap_delivery_order_details_df:
    domain: sales
    owner: supply_chain_data_owner
    sensitivity: internal
    default_partition:
      field: pt
      strategy: max_pt
    allowed_columns:
      - delivery_ord_code
      - product_code
      - sales_amount
      - pt
    denied_columns:
      - customer_phone
      - customer_address
    row_policies:
      - role: region_manager
        predicate: region_code IN (${user.region_codes})
    max_limit: 1000
    require_partition_filter: true

default_limit: 100
max_limit: 200

security:
  deny_unapproved_join: true
  max_join_tables: 3
  query_timeout_seconds: 300

Legacy Allowlist (`config/allowlist.yaml`)

Simple table/column whitelist for basic use cases: saky_dw_cdm: tables: dim_pub_product_df: {} # Allow all columns saky_dw_ods: tables: ods_sap_delivery_order_details_df: allowed_columns: # Column-level restriction - delivery_ord_code - product_code - sales_amount - pt

default_limit: 100 max_limit: 200


### Join Catalog (`config/ai_catalog/joins.yaml`)

Pre-approved table join relationships:

```yaml
joins:
  - id: ods_delivery_to_dim_product
    left_table: saky_dw_ods.ods_sap_delivery_order_details_df
    right_table: saky_dw_cdm.dim_pub_product_df
    business_purpose: 为交货单明细补齐产品主数据属性
    approved: true
    confidence: high
    cardinality: N:1
    on:
      - left_field: product_code
        right_field: item_code
    example_sql: |
      SELECT o.delivery_ord_code, p.standard_name
      FROM saky_dw_ods.ods_sap_delivery_order_details_df o
      LEFT JOIN saky_dw_cdm.dim_pub_product_df p
        ON o.product_code = p.item_code
      WHERE o.pt = MAX_PT("saky_dw_ods.ods_sap_delivery_order_details_df")
      LIMIT 100;

Audit Logging

Audit logs are written to logs/audit.jsonl in JSONL format, rotated daily.

Each record contains:

{
  "audit_id": "uuid",
  "request_id": "uuid",
  "timestamp": "2026-05-07T15:30:00.123Z",
  "user_id": "sales_agent",
  "client_id": "sales_bot",
  "session_id": "session-xxx",
  "source_ip": "10.0.1.55",
  "auth_method": "bearer_token",
  "token_id": "agent-sales-001",
  "tool_name": "run_select_sql",
  "outcome": "success",
  "latency_ms": 234.5,
  "request_params": {
    "project": "saky_dw_ods",
    "sql": "SELECT ..."
  },
  "response_summary": {
    "tables_used": ["saky_dw_ods.ods_sap_delivery_order_details_df"],
    "columns_used": ["delivery_ord_code", "sales_amount"],
    "row_count": 15,
    "truncated": false
  },
  "policy_decision": "allowed",
  "deny_reason": null,
  "error_code": null,
  "error_message": null
}

Error Codes

Code	Description
`AUTH_MISSING`	Authentication required but not provided
`AUTH_INVALID_TOKEN`	Bearer token not recognized
`AUTH_FORBIDDEN_ORIGIN`	Origin not in allowed list
`AUTH_FORBIDDEN_HOST`	Host not in allowed list
`AUTH_FORBIDDEN_TOOL`	Tool not allowed for this token
`AUTH_FORBIDDEN_PROJECT`	Project not allowed for this token
`TABLE_NOT_ALLOWED`	Table not in allowlist
`COLUMN_NOT_ALLOWED`	Column not in allowed columns
`SELECT_STAR_NOT_ALLOWED`	SELECT * not allowed for restricted tables
`SQL_PARSE_ERROR`	SQL syntax error
`SQL_MULTIPLE_STATEMENTS`	Multiple SQL statements not allowed
`SQL_READONLY_ONLY`	Only SELECT queries allowed
`TABLE_REFERENCE_REQUIRED`	SQL must reference at least one table
`LIMIT_MUST_BE_LITERAL`	LIMIT must be a literal integer
`LIMIT_MUST_BE_POSITIVE`	LIMIT must be >= 1
`RATE_LIMIT_EXCEEDED`	Too many requests
`INTERNAL_ERROR`	Internal server error (details hidden)

Project Structure

数仓MCP/
├── src/odps_mcp_server/
│   ├── __init__.py          # Package version
│   ├── __main__.py          # Entry point
│   ├── server.py            # MCP server and ASGI app
│   ├── service.py           # Query service (business logic)
│   ├── config.py            # Configuration models
│   ├── policy.py            # Policy service (access control)
│   ├── odps_client.py       # MaxCompute client wrapper
│   ├── sql_guard.py         # SQL validation (sqlglot)
│   ├── auth.py              # Authentication module
│   ├── context.py           # Request context
│   ├── audit.py             # Audit logging
│   ├── response.py          # Unified response builder
│   ├── middleware.py         # Starlette middleware
│   ├── errors.py            # Error codes
│   └── logging_utils.py     # Logging utilities
├── config/
│   ├── allowlist.yaml       # Table access policy
│   ├── auth.yaml            # Token configuration
│   ├── security.yaml        # Origin/Host whitelist
│   └── ai_catalog/          # Table metadata and join paths
├── tests/                   # Unit tests
├── scripts/                 # Utility scripts
├── docs/                    # Documentation
├── .env.example             # Environment variables template
├── .env.local               # Local config (git-ignored)
└── pyproject.toml           # Project metadata

Manual Smoke Test

python scripts/manual_smoke_test.py list-tables --project saky_dw_ods
python scripts/manual_smoke_test.py describe-table --project saky_dw_cdm --table dim_pub_product_df
python scripts/manual_smoke_test.py run-sql --project saky_dw_ods --sql "SELECT delivery_ord_code FROM saky_dw_ods.ods_sap_delivery_order_details_df LIMIT 10"

Security Notes

Never commit .env.local or files containing credentials
Review config/allowlist.yaml before production deployment
Enable authentication (MCP_REQUIRE_AUTH=true) for production
Monitor logs/audit.jsonl for suspicious activity
Rotate Bearer tokens periodically

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

ODPS MCP Server

README

ODPS MCP Server

Features

Quick Start

1. Install Dependencies

2. Configure Environment

3. Run Server

Endpoints

MCP Tools

Discovery Tools

Query Tools

Semantic Layer Tools (P2)

Query Planner Tools (P3)

Cost Control (P4)

Risk Assessment

Policy Enforcement

Response with Cost Estimate

Concurrency Control

Enterprise Integration (P5)

Identity Providers

Role-Based Access Control

Client Configuration

Permission Checks

Evaluation Framework (P6)

Evaluation Dimensions

Golden Dataset

Running Evaluation

Evaluation Report

MCP Cockpit (P7)

Access

Features

Admin API

Screenshots

Tool Response Structure

Authentication

Bearer Token

Gateway Header

Token Configuration

Policy Configuration

Policy Registry (config/policy.yaml) - Recommended

Legacy Allowlist (config/allowlist.yaml)

Audit Logging

Error Codes

Project Structure

Manual Smoke Test

Security Notes

Recommended Servers

Policy Registry (`config/policy.yaml`) - Recommended

Legacy Allowlist (`config/allowlist.yaml`)