ODPS MCP Server

ODPS MCP Server

Read-only MCP server for Alibaba Cloud MaxCompute, enabling safe querying of allowlisted tables via tools like list_allowed_tables, describe_table, preview_table, and run_select_sql.

Category
Visit Server

README

ODPS MCP Server

Read-only MCP server for Alibaba Cloud MaxCompute with enterprise-grade security.

Features

  • Read-only SQL Guard: Only SELECT queries are allowed, with automatic LIMIT injection
  • Table/Column-level Access Control: Fine-grained policy via YAML allowlist
  • Policy Registry: Domain-based policy management with sensitivity levels, row policies, and partition filters
  • Authentication: Bearer Token and Gateway Header authentication
  • Audit Logging: JSONL-format audit logs with request tracking
  • Unified Response Structure: Consistent ok/data/error/metadata/audit_id format
  • Join Path Catalog: Pre-approved table join relationships with business context

Quick Start

1. Install Dependencies

python -m venv .venv
.venv\Scripts\activate  # Windows
# source .venv/bin/activate  # Linux/Mac
pip install -e .

2. Configure Environment

# Copy example config
copy .env.example .env.local  # Windows
# cp .env.example .env.local  # Linux/Mac

# Edit .env.local with your ODPS credentials

Required variables:

  • ALIBABA_CLOUD_ACCESS_KEY_ID
  • ALIBABA_CLOUD_ACCESS_KEY_SECRET
  • ODPS_ENDPOINT
  • ODPS_PROJECT
  • ODPS_ALLOWLIST_PATH

3. Run Server

# Streamable HTTP mode (recommended)
python -m odps_mcp_server.server

# Or using uvicorn
uvicorn odps_mcp_server.server:create_app --factory --host 0.0.0.0 --port 8000

Endpoints

Endpoint Method Description
/mcp POST Standard MCP Streamable HTTP endpoint
/ POST Compatibility endpoint (redirects to /mcp)
/health GET Health check

MCP Tools

Discovery Tools

Tool Description
list_allowed_tables List tables accessible by the current policy
describe_table Get table schema, columns, and partition info
get_join_paths Get pre-approved join relationships between tables

Query Tools

Tool Description
preview_table Preview rows from a table with automatic partition pruning
run_select_sql Execute a read-only SELECT query

Semantic Layer Tools (P2)

Tool Description
search_business_terms Search business terms by keyword
get_metric_definition Get metric definition, expression, and dimensions
list_metrics List available metrics, optionally by domain
get_dimensions Get available dimensions for a metric
generate_metric_sql Generate SQL for a metric with dimensions and filters
validate_metric_query Validate a metric query before execution
list_domains List all business domains

Query Planner Tools (P3)

Tool Description
plan_query Convert natural language question to query plan
explain_query_plan Get detailed explanation of a query plan
run_approved_query Execute an approved query plan

The Query Planner enables agents to query data without writing SQL directly:

  1. plan_query("3月华南区发货金额是多少?") → Returns a query plan with metrics, dimensions, filters
  2. explain_query_plan(plan_id) → Get human-readable explanation
  3. run_approved_query(plan_id) → Execute the plan and get results

This approach ensures:

  • LLM understands the question and generates a candidate plan
  • Program handles permissions, table selection, column selection, join checks, limits
  • Risk flags are identified before execution

Cost Control (P4)

The server includes built-in cost control mechanisms to prevent expensive queries:

Risk Assessment

Every query is evaluated for risk based on:

  • Missing WHERE clause
  • Missing partition filter
  • Missing LIMIT
  • Large LIMIT values
  • High JOIN count
  • SELECT * usage

Risk levels: lowmediumhighcritical

Policy Enforcement

Configured in config/policy.yaml under security:

security:
  deny_unapproved_join: true
  max_join_tables: 3
  max_estimated_scan_bytes: 10737418240  # 10GB
  query_timeout_seconds: 300
  max_concurrent_queries_per_user: 5

Response with Cost Estimate

When cost control is enabled, responses include:

{
  "ok": true,
  "data": { ... },
  "cost_estimate": {
    "policy_decision": "allowed",
    "risk_level": "low",
    "risk_flags": [],
    "tables_used": ["project.table"],
    "columns_used": ["col1", "col2"],
    "partitions_used": ["pt"],
    "has_partition_filter": true,
    "join_count": 0
  }
}

Concurrency Control

Per-user concurrent query limits are enforced. Queries exceeding the limit receive a CONCURRENCY_LIMIT_EXCEEDED error.

Enterprise Integration (P5)

Supports enterprise identity providers and role-based access control.

Identity Providers

Configure in config/enterprise.yaml:

identity_providers:
  wecom:
    name: 企业微信
    type: wecom
    enabled: true
    corp_id: "your-corp-id"
    user_id_field: "X-Wecom-User-Id"

  dingtalk:
    name: 钉钉
    type: dingtalk
    enabled: true
    app_key: "your-app-key"
    user_id_field: "X-Dingtalk-User-Id"

Role-Based Access Control

roles:
  data_analyst:
    name: 数据分析师
    allowed_tools:
      - list_allowed_tables
      - describe_table
      - preview_table
      - run_select_sql
    allowed_domains:
      - sales
      - product
    rate_limit: "100/minute"
    query_quota_daily: 1000

Client Configuration

clients:
  sales_bot:
    name: 销售数据机器人
    client_id: sales_bot
    allowed_tools:
      - list_allowed_tables
      - describe_table
    allowed_domains:
      - sales
    rate_limit: "50/minute"

Permission Checks

  • Tool permission: user + client must both allow the tool
  • Domain permission: user + client must both allow the domain
  • Rate limit: stricter limit between user and client applies

Evaluation Framework (P6)

Golden dataset for evaluating Agent query accuracy.

Evaluation Dimensions

Dimension Weight Description
Table Selection 20% Correct tables identified
Column Selection 15% Correct columns/dimensions
Metric Accuracy 20% Correct metrics calculated
Join Path 10% Correct join relationships
Partition Filter 15% Partition predicates present
Permission Check 10% Access control enforced
SQL Executable 5% Valid SQL generated
Answer Traceable 5% SQL evidence provided

Golden Dataset

Located at config/evaluation/golden_dataset.yaml:

test_cases:
  - id: sales_001
    question: "2026年3月华南区发货金额是多少?"
    domain: sales
    expected_tables:
      - saky_dw_ods.ods_sap_delivery_order_details_df
    expected_metrics:
      - delivery_amount
    required_filters:
      - field: region_code
        value: "华南"
    forbidden_behavior:
      - full_table_scan
    should_be_blocked: false

Running Evaluation

Use the Evaluator service to run evaluations:

from odps_mcp_server.evaluator import Evaluator, load_evaluation_config

config = load_evaluation_config("config/evaluation/golden_dataset.yaml")
evaluator = Evaluator(config)

# Evaluate a plan
result = evaluator.evaluate_plan(
    test_id="sales_001",
    plan={"metrics": ["delivery_amount"], "tables": [...]},
    tables_used=["saky_dw_ods.ods_sap_delivery_order_details_df"],
    sql="SELECT ... LIMIT 100",
)

# Generate report
report = evaluator.generate_report()
print(f"Pass rate: {report.pass_rate:.2%}")

Evaluation Report

{
  "summary": {
    "total_cases": 11,
    "passed_cases": 9,
    "failed_cases": 2,
    "pass_rate": 0.8182
  },
  "dimension_scores": {
    "table_selection": 0.95,
    "metric_accuracy": 0.90,
    "partition_filter": 0.85,
    "permission_check": 1.0
  },
  "critical_failures": ["security_001", "cross_001"]
}

MCP Cockpit (P7)

Web-based admin console for managing the MCP Server.

Access

When running the server, access the cockpit at:

http://localhost:8000/admin

Features

Page Description
Dashboard Server status, component health, quick stats
Domains Business domain management
Tables Table permissions and metadata
Metrics Metric definitions and configurations
Joins Join path management (approved/candidate)
Audit Query audit logs
Evaluation Golden dataset test cases

Admin API

Endpoint Method Description
/admin/api/status GET Server status
/admin/api/domains GET List domains
/admin/api/tables GET List tables
/admin/api/metrics GET List metrics
/admin/api/joins GET List join paths
/admin/api/audit GET Audit logs
/admin/api/evaluation/cases GET Evaluation test cases

Screenshots

The cockpit provides:

  • Dashboard: Real-time server status, component health, query statistics
  • Domain Management: View business domains, owners, and table/metric counts
  • Table Management: Browse table permissions, sensitivity levels, column restrictions
  • Metric Management: View metric definitions, expressions, dimensions
  • Join Management: Review approved and candidate join paths
  • Audit: Search and filter query audit logs
  • Evaluation: View and run golden dataset test cases

Tool Response Structure

All tools return a unified response format:

{
  "ok": true,
  "data": { ... },
  "error": null,
  "metadata": {
    "request_id": "uuid",
    "tool_name": "describe_table",
    "timestamp": "2026-05-07T15:30:00Z",
    "latency_ms": 45.2,
    "tables_used": ["saky_dw_cdm.dim_pub_product_df"],
    "columns_used": ["item_code", "standard_name"],
    "row_count": 1,
    "truncated": false
  },
  "audit_id": "uuid"
}

Authentication

Bearer Token

Include the token in the Authorization header:

curl -H "Authorization: Bearer agent-sales-001" \
  http://localhost:8000/mcp \
  -d '{"tool": "list_allowed_tables"}'

Gateway Header

When behind an API gateway, the gateway can inject identity headers:

curl -H "X-User-Id: sales_agent" \
  -H "X-Client-Id: sales_bot" \
  -H "X-Session-Id: session-xxx" \
  http://localhost:8000/mcp \
  -d '{"tool": "list_allowed_tables"}'

Token Configuration

Define tokens in config/auth.yaml:

tokens:
  agent-sales-001:
    user_id: sales_agent
    client_id: sales_bot
    allowed_tools:
      - list_allowed_tables
      - describe_table
      - preview_table
    allowed_projects:
      - saky_dw_ods
      - saky_dw_cdm
    rate_limit: "100/minute"

Policy Configuration

Policy Registry (config/policy.yaml) - Recommended

The Policy Registry provides enterprise-grade policy management with:

  • Domain-based organization: Group tables by business domain (sales, supply_chain, product)
  • Sensitivity levels: public, internal, confidential, restricted
  • Column-level control: allowed_columns and denied_columns
  • Row-level policies: Role-based row filtering
  • Partition filter requirements: Enforce partition predicates
  • Per-table limits: Override global limit per table
domains:
  sales:
    name: 销售域
    owner: 销售运营部
    default_project: saky_dw_ods

tables:
  saky_dw_ods.ods_sap_delivery_order_details_df:
    domain: sales
    owner: supply_chain_data_owner
    sensitivity: internal
    default_partition:
      field: pt
      strategy: max_pt
    allowed_columns:
      - delivery_ord_code
      - product_code
      - sales_amount
      - pt
    denied_columns:
      - customer_phone
      - customer_address
    row_policies:
      - role: region_manager
        predicate: region_code IN (${user.region_codes})
    max_limit: 1000
    require_partition_filter: true

default_limit: 100
max_limit: 200

security:
  deny_unapproved_join: true
  max_join_tables: 3
  query_timeout_seconds: 300

Legacy Allowlist (config/allowlist.yaml)

Simple table/column whitelist for basic use cases: saky_dw_cdm: tables: dim_pub_product_df: {} # Allow all columns saky_dw_ods: tables: ods_sap_delivery_order_details_df: allowed_columns: # Column-level restriction - delivery_ord_code - product_code - sales_amount - pt

default_limit: 100 max_limit: 200


### Join Catalog (`config/ai_catalog/joins.yaml`)

Pre-approved table join relationships:

```yaml
joins:
  - id: ods_delivery_to_dim_product
    left_table: saky_dw_ods.ods_sap_delivery_order_details_df
    right_table: saky_dw_cdm.dim_pub_product_df
    business_purpose: 为交货单明细补齐产品主数据属性
    approved: true
    confidence: high
    cardinality: N:1
    on:
      - left_field: product_code
        right_field: item_code
    example_sql: |
      SELECT o.delivery_ord_code, p.standard_name
      FROM saky_dw_ods.ods_sap_delivery_order_details_df o
      LEFT JOIN saky_dw_cdm.dim_pub_product_df p
        ON o.product_code = p.item_code
      WHERE o.pt = MAX_PT("saky_dw_ods.ods_sap_delivery_order_details_df")
      LIMIT 100;

Audit Logging

Audit logs are written to logs/audit.jsonl in JSONL format, rotated daily.

Each record contains:

{
  "audit_id": "uuid",
  "request_id": "uuid",
  "timestamp": "2026-05-07T15:30:00.123Z",
  "user_id": "sales_agent",
  "client_id": "sales_bot",
  "session_id": "session-xxx",
  "source_ip": "10.0.1.55",
  "auth_method": "bearer_token",
  "token_id": "agent-sales-001",
  "tool_name": "run_select_sql",
  "outcome": "success",
  "latency_ms": 234.5,
  "request_params": {
    "project": "saky_dw_ods",
    "sql": "SELECT ..."
  },
  "response_summary": {
    "tables_used": ["saky_dw_ods.ods_sap_delivery_order_details_df"],
    "columns_used": ["delivery_ord_code", "sales_amount"],
    "row_count": 15,
    "truncated": false
  },
  "policy_decision": "allowed",
  "deny_reason": null,
  "error_code": null,
  "error_message": null
}

Error Codes

Code Description
AUTH_MISSING Authentication required but not provided
AUTH_INVALID_TOKEN Bearer token not recognized
AUTH_FORBIDDEN_ORIGIN Origin not in allowed list
AUTH_FORBIDDEN_HOST Host not in allowed list
AUTH_FORBIDDEN_TOOL Tool not allowed for this token
AUTH_FORBIDDEN_PROJECT Project not allowed for this token
TABLE_NOT_ALLOWED Table not in allowlist
COLUMN_NOT_ALLOWED Column not in allowed columns
SELECT_STAR_NOT_ALLOWED SELECT * not allowed for restricted tables
SQL_PARSE_ERROR SQL syntax error
SQL_MULTIPLE_STATEMENTS Multiple SQL statements not allowed
SQL_READONLY_ONLY Only SELECT queries allowed
TABLE_REFERENCE_REQUIRED SQL must reference at least one table
LIMIT_MUST_BE_LITERAL LIMIT must be a literal integer
LIMIT_MUST_BE_POSITIVE LIMIT must be >= 1
RATE_LIMIT_EXCEEDED Too many requests
INTERNAL_ERROR Internal server error (details hidden)

Project Structure

数仓MCP/
├── src/odps_mcp_server/
│   ├── __init__.py          # Package version
│   ├── __main__.py          # Entry point
│   ├── server.py            # MCP server and ASGI app
│   ├── service.py           # Query service (business logic)
│   ├── config.py            # Configuration models
│   ├── policy.py            # Policy service (access control)
│   ├── odps_client.py       # MaxCompute client wrapper
│   ├── sql_guard.py         # SQL validation (sqlglot)
│   ├── auth.py              # Authentication module
│   ├── context.py           # Request context
│   ├── audit.py             # Audit logging
│   ├── response.py          # Unified response builder
│   ├── middleware.py         # Starlette middleware
│   ├── errors.py            # Error codes
│   └── logging_utils.py     # Logging utilities
├── config/
│   ├── allowlist.yaml       # Table access policy
│   ├── auth.yaml            # Token configuration
│   ├── security.yaml        # Origin/Host whitelist
│   └── ai_catalog/          # Table metadata and join paths
├── tests/                   # Unit tests
├── scripts/                 # Utility scripts
├── docs/                    # Documentation
├── .env.example             # Environment variables template
├── .env.local               # Local config (git-ignored)
└── pyproject.toml           # Project metadata

Manual Smoke Test

python scripts/manual_smoke_test.py list-tables --project saky_dw_ods
python scripts/manual_smoke_test.py describe-table --project saky_dw_cdm --table dim_pub_product_df
python scripts/manual_smoke_test.py run-sql --project saky_dw_ods --sql "SELECT delivery_ord_code FROM saky_dw_ods.ods_sap_delivery_order_details_df LIMIT 10"

Security Notes

  • Never commit .env.local or files containing credentials
  • Review config/allowlist.yaml before production deployment
  • Enable authentication (MCP_REQUIRE_AUTH=true) for production
  • Monitor logs/audit.jsonl for suspicious activity
  • Rotate Bearer tokens periodically

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured