nl2sql-mcp

nl2sql-mcp

A production-ready MCP server that transforms natural language into safe, executable SQL queries with multi-database support and intelligent schema analysis.

Category
Visit Server

README

nl2sql-mcp

Python 3.13+ MCP Protocol Type Checked Code Style: Ruff

A production-ready Model Context Protocol (MCP) server that transforms natural language into safe, executable SQL queries. Built for LLMs with comprehensive type safety, multi-database support, and intelligent schema analysis.

Database Support

This project provides an "all drivers" container image that can connect to the major databases without post-build steps. Drivers are modeled as optional dependencies and the Dockerfile installs the minimal OS libraries required at runtime.

  • PostgreSQL: Python driver psycopg[binary] (v3); OS libs: libpq5.
  • MySQL/MariaDB: Python driver mysqlclient; OS libs: libmariadb3.
  • SQL Server: Python driver pyodbc; OS libs: unixodbc, Microsoft msodbcsql18 (+ optional mssql-tools18).
  • SQLite: stdlib sqlite3 (no extra OS libs).

Local dev installs:

  • All drivers: uv sync --extra drivers-all
  • Specific backends: uv sync --extra postgres --extra mysql (as needed)

Build the all-drivers image:

docker build -t nl2sql-mcp:drivers-all .
docker run --rm -p 8000:8000 nl2sql-mcp:drivers-all

Notes:

  • Debian slim base maximizes prebuilt wheel compatibility. Avoid Alpine for DB stacks.
  • To switch MySQL to a pure-Python driver, replace mysqlclient with mysql-connector-python in pyproject.toml and re-lock with uv lock --extra drivers-all.

๐Ÿš€ Key Features

๐ŸŽฏ LLM-Optimized Intelligence

  • Intent-first query planning with structured schema context and join recommendations
  • Semantic schema analysis with automatic table classification (fact, dimension, bridge, reference)
  • Multi-modal table discovery using lexical matching, embeddings, and graph traversal
  • Column role detection (key, date, metric, category) with semantic tagging

๐Ÿ›ก๏ธ Production-Ready Safety

  • SELECT-only execution with comprehensive SQL validation
  • Dialect normalization and transpilation via SQLGlot
  • Row limits and cell truncation to prevent resource exhaustion
  • Comprehensive error handling with actionable feedback

๐Ÿ”ง Multi-Database Support

  • SQLAlchemy-powered compatibility with PostgreSQL, MySQL, SQL Server, SQLite, Oracle, Snowflake
  • Dialect-aware SQL generation and optimization
  • Cross-platform schema reflection and analysis

๐Ÿ—๏ธ Enterprise Architecture

  • Type-safe Pydantic models throughout
  • Dependency injection for testability
  • Background initialization with graceful degradation
  • Zero hardcoded assumptions - adapts to any database schema

๐Ÿ“ฆ Quick Start

Prerequisites

  • Python 3.13+
  • uv package manager
  • Database with appropriate drivers installed

Installation

# Clone the repository
git clone https://github.com/jb3cloud/nl2sql-mcp.git
cd nl2sql-mcp

# Install dependencies
uv sync

# Configure your database
cp .env.example .env
# Edit .env with your database connection details

Basic Usage

# Start the MCP server
uv run nl2sql-mcp

# Or run directly
uv run python -m nl2sql_mcp.server

# Or run using uvx
uvx --from git+https://github.com/jb3cloud/nl2sql-mcp nl2sql-mcp

๐Ÿ› ๏ธ Makefile Workflow

The repository includes a Makefile that standardizes common tasks. All targets invoke tooling via uv run to ensure a consistent environment.

# Discover available targets and variables
make help

# Run the full Quality Gauntlet (format โ†’ lint โ†’ typecheck โ†’ test)
make quality

# Individual steps (as needed)
make format     # Ruff format
make lint       # Ruff check --fix
make typecheck  # basedpyright (strict)
make test       # pytest
make clean      # Remove caches and build artifacts

Deployment helpers for Azure Container Apps are also provided:

# Create docker container
make docker

# Run docker container
make docker-run

# Verify Azure CLI login and (optionally) set subscription
make preflight AZ_SUBSCRIPTION="00000000-0000-0000-0000-000000000000"

# Build from source and deploy
make publish \
  APP_NAME=my-nl2sql-mcp \
  RESOURCE_GROUP=my-rg \
  ENVIRONMENT=my-aca-env \
  LOCATION=eastus \
  AZ_SUBSCRIPTION="00000000-0000-0000-0000-000000000000" \
  INGRESS=external \
  TARGET_PORT=8000 \
  ENV_ARGS="NL2SQL_MCP_DATABASE_URL=postgresql://user:pass@host:5432/db"

Supported variables (can be provided via environment or CLI): APP_NAME, RESOURCE_GROUP, LOCATION, ENVIRONMENT, AZ_SUBSCRIPTION, INGRESS (default external), TARGET_PORT (default 8000).

Environment Configuration

Create a .env file with your database connection:

# Required: Database connection
NL2SQL_MCP_DATABASE_URL=postgresql://user:pass@localhost:5432/dbname

# Optional: Result limits and debugging
NL2SQL_MCP_ROW_LIMIT=1000
NL2SQL_MCP_MAX_CELL_CHARS=500
NL2SQL_MCP_DEBUG_TOOLS=1  # Enable find_tables/find_columns tools

๐Ÿ›๏ธ Architecture

graph TD
    A[LLM Client] -->|MCP Protocol| B[FastMCP Server]

    subgraph "Core Services"
        B --> C[Schema Service]
        B --> D[SQLGlot Service]
        B --> E[Execution Engine]
    end

    subgraph "Intelligence Layer"
        C --> F[Schema Explorer]
        C --> G[Query Engine]
        F --> H[Reflection & Profiling]
        F --> I[Graph Analysis]
        G --> J[Multi-Modal Retrieval]
        G --> K[Graph Expansion]
    end

    subgraph "Data Layer"
        E --> L[SQLAlchemy Engine]
        H --> L
        L --> M[(Your Database)]
    end

    style A fill:#e1f5fe
    style B fill:#f3e5f5
    style M fill:#e8f5e8

The system follows a two-phase architecture:

  1. Schema Building Phase: Comprehensive database analysis, relationship mapping, and semantic understanding
  2. Query Processing Phase: Real-time natural language to SQL conversion with context-aware planning

๐Ÿ”Œ MCP Tools API

Core Workflow Tools

get_init_status()

Check server readiness and initialization progress.

// Response
{
  "phase": "READY",
  "attempts": 1,
  "started_at": "2024-01-15T10:30:00Z",
  "completed_at": "2024-01-15T10:30:45Z"
}

get_database_overview(req: DatabaseOverviewRequest)

High-level database summary with subject areas.

// Request
{
  "include_subject_areas": true,
  "area_limit": 8
}

// Response
{
  "total_tables": 45,
  "total_schemas": 3,
  "subject_areas": [
    {
      "name": "Sales Analytics",
      "tables": ["orders", "customers", "products"],
      "summary": "Customer orders and product sales data"
    }
  ]
}

plan_query_for_intent(req: PlanQueryRequest)

Intent-first SQL planning with structured guidance.

// Request
{
  "request": "Show monthly revenue by region for 2024",
  "constraints": {
    "time_range": "2024-01-01..2024-12-31",
    "metric": "revenue"
  },
  "detail_level": "standard"
}

// Response
{
  "relevant_tables": [
    {
      "table_key": "sales.orders",
      "relevance_score": 0.95,
      "why_relevant": "Contains revenue data and date columns"
    }
  ],
  "join_plan": [
    ["sales.orders.customer_id", "customers.id"]
  ],
  "main_table": "sales.orders",
  "draft_sql": "SELECT DATE_TRUNC('month', order_date) as month...",
  "confidence": 0.92
}

execute_query(req: ExecuteQueryRequest)

Safe SQL execution with validation and results.

// Request
{
  "sql": "SELECT region, SUM(amount) as revenue FROM sales.orders WHERE order_date >= '2024-01-01' GROUP BY region"
}

// Response
{
  "success": true,
  "rows": [
    {"region": "North", "revenue": 125000.50},
    {"region": "South", "revenue": 98750.25}
  ],
  "row_count": 2,
  "columns": [
    {"name": "region", "type": "VARCHAR"},
    {"name": "revenue", "type": "DECIMAL"}
  ],
  "execution_time_ms": 45,
  "next_action": "success"
}

Discovery Tools

get_table_info(req: TableInfoRequest)

Detailed table metadata optimized for SQL generation.

// Request
{
  "table_key": "sales.orders",
  "include_samples": true,
  "column_role_filter": ["key", "date", "metric"]
}

find_tables(req: FindTablesRequest) (Debug Mode)

Fast table discovery by natural language intent.

find_columns(req: FindColumnsRequest) (Debug Mode)

Column search for SELECT and WHERE clause building.

SQL Assistance Tools

sql_validate(sql: str)

Validate SQL syntax and structure.

sql_auto_transpile_for_database(sql: str)

Automatically detect and convert SQL dialects.

sql_optimize_for_database(sql: str)

Optimize SQL for your database engine.

โš™๏ธ Configuration

Database Support

Database SQLAlchemy Driver Connection String Example
PostgreSQL psycopg2 postgresql://user:pass@host:5432/db
MySQL pymysql mysql+pymysql://user:pass@host:3306/db
SQL Server pyodbc mssql+pyodbc://user:pass@host/db?driver=ODBC+Driver+17
SQLite Built-in sqlite:///path/to/database.db
Oracle cx_oracle oracle://user:pass@host:1521/service
Snowflake snowflake-sqlalchemy snowflake://user:pass@account/db/schema

Schema Configuration

Control schema analysis behavior:

# Via environment or configuration
NL2SQL_MCP_INCLUDE_SCHEMAS=public,analytics
NL2SQL_MCP_EXCLUDE_SCHEMAS=temp,audit
NL2SQL_MCP_SAMPLE_SIZE=100  # Rows per table for profiling

Performance Tuning

# Memory and processing limits
NL2SQL_MCP_ROW_LIMIT=1000           # Max rows returned
NL2SQL_MCP_MAX_CELL_CHARS=500       # Truncate long text
NL2SQL_MCP_SAMPLE_TIMEOUT=5         # Sampling timeout (seconds)
NL2SQL_MCP_MAX_COLS_FOR_EMBEDDINGS=20  # Column embedding limit
NL2SQL_MCP_REFLECT_TIMEOUT=15       # Reflection timeout per statement (seconds)
NL2SQL_MCP_ENABLE_LIGHTWEIGHT_NER=1 # Toggle NER enrichment during profiling (0 disables)

๐Ÿงช Testing and Development

Local Development Setup

# Install development dependencies
uv sync --dev

# Run type checking
uv run basedpyright

# Format and lint code
uv run ruff format .
uv run ruff check --fix .

# Run tests
uv run pytest -v

Testing with Live Database

# Schema intelligence testing
uv run python scripts/test_intelligence_harness.py "show sales by region"

# SQL tools testing
uv run python scripts/test_sqlglot_harness.py "SELECT TOP 10 * FROM customers"

Quality Assurance

  • 100% type coverage with basedpyright strict mode
  • Comprehensive test suite with pytest
  • Zero linting violations with ruff
  • Dependency injection for testability
  • Pure functions where possible

๐Ÿ“š Advanced Usage

Custom Schema Analysis

from nl2sql_mcp.services import ConfigService, SchemaServiceManager

# Initialize with custom configuration
config = ConfigService()
manager = SchemaServiceManager.get_instance()

# Access schema service after initialization
schema_service = manager.get_schema_service()
result = schema_service.analyze_query_schema(
    "Find customers with high lifetime value",
    max_tables=10,
    expand_strategy="fk_following"
)

Multi-Agent Integration

The MCP server integrates seamlessly with multi-agent frameworks:

# Example with your LLM framework
async def query_database(natural_language_query: str):
    # 1. Plan the query
    plan_result = await mcp_client.call_tool(
        "plan_query_for_intent",
        {"request": natural_language_query}
    )

    # 2. Execute the draft SQL
    if plan_result.draft_sql:
        execution_result = await mcp_client.call_tool(
            "execute_query",
            {"sql": plan_result.draft_sql}
        )
        return execution_result.rows

Code Standards

  • Python 3.13+ with strict type checking
  • Pydantic models for all data structures
  • Pure functions and dependency injection for testability
  • Comprehensive docstrings for public APIs
  • Zero tolerance for type errors, lint violations, or test failures

๐Ÿข Production Deployment

Environment Variables

# Required
NL2SQL_MCP_DATABASE_URL=postgresql://...

# Optional performance tuning
NL2SQL_MCP_ROW_LIMIT=5000
NL2SQL_MCP_MAX_CELL_CHARS=1000
NL2SQL_MCP_SAMPLE_SIZE=200

Health Monitoring

Monitor server health via the get_init_status() tool:

  • READY: Server fully operational
  • STARTING: Initialization in progress
  • FAILED: Initialization failed, check logs

๐Ÿ“– Related Research

This implementation incorporates state-of-the-art research in text-to-SQL:

  • Multi-agent architectures for complex query decomposition
  • Schema linking with bidirectional context understanding
  • Error correction frameworks with multi-turn refinement
  • Semantic understanding via lightweight NER and role classification

For detailed research context, see NL2SQL_RESEARCH.md.

๐Ÿ™ Acknowledgments

  • FastMCP for the excellent MCP server framework
  • SQLGlot for multi-dialect SQL parsing and transpilation
  • SQLAlchemy for robust database abstraction
  • The text-to-SQL research community for advancing the field

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured