nl2sql-mcp
A production-ready MCP server that transforms natural language into safe, executable SQL queries with multi-database support and intelligent schema analysis.
README
nl2sql-mcp
A production-ready Model Context Protocol (MCP) server that transforms natural language into safe, executable SQL queries. Built for LLMs with comprehensive type safety, multi-database support, and intelligent schema analysis.
Database Support
This project provides an "all drivers" container image that can connect to the major databases without post-build steps. Drivers are modeled as optional dependencies and the Dockerfile installs the minimal OS libraries required at runtime.
- PostgreSQL: Python driver
psycopg[binary](v3); OS libs:libpq5. - MySQL/MariaDB: Python driver
mysqlclient; OS libs:libmariadb3. - SQL Server: Python driver
pyodbc; OS libs:unixodbc, Microsoftmsodbcsql18(+ optionalmssql-tools18). - SQLite: stdlib
sqlite3(no extra OS libs).
Local dev installs:
- All drivers:
uv sync --extra drivers-all - Specific backends:
uv sync --extra postgres --extra mysql(as needed)
Build the all-drivers image:
docker build -t nl2sql-mcp:drivers-all .
docker run --rm -p 8000:8000 nl2sql-mcp:drivers-all
Notes:
- Debian slim base maximizes prebuilt wheel compatibility. Avoid Alpine for DB stacks.
- To switch MySQL to a pure-Python driver, replace
mysqlclientwithmysql-connector-pythoninpyproject.tomland re-lock withuv lock --extra drivers-all.
๐ Key Features
๐ฏ LLM-Optimized Intelligence
- Intent-first query planning with structured schema context and join recommendations
- Semantic schema analysis with automatic table classification (fact, dimension, bridge, reference)
- Multi-modal table discovery using lexical matching, embeddings, and graph traversal
- Column role detection (key, date, metric, category) with semantic tagging
๐ก๏ธ Production-Ready Safety
- SELECT-only execution with comprehensive SQL validation
- Dialect normalization and transpilation via SQLGlot
- Row limits and cell truncation to prevent resource exhaustion
- Comprehensive error handling with actionable feedback
๐ง Multi-Database Support
- SQLAlchemy-powered compatibility with PostgreSQL, MySQL, SQL Server, SQLite, Oracle, Snowflake
- Dialect-aware SQL generation and optimization
- Cross-platform schema reflection and analysis
๐๏ธ Enterprise Architecture
- Type-safe Pydantic models throughout
- Dependency injection for testability
- Background initialization with graceful degradation
- Zero hardcoded assumptions - adapts to any database schema
๐ฆ Quick Start
Prerequisites
- Python 3.13+
- uv package manager
- Database with appropriate drivers installed
Installation
# Clone the repository
git clone https://github.com/jb3cloud/nl2sql-mcp.git
cd nl2sql-mcp
# Install dependencies
uv sync
# Configure your database
cp .env.example .env
# Edit .env with your database connection details
Basic Usage
# Start the MCP server
uv run nl2sql-mcp
# Or run directly
uv run python -m nl2sql_mcp.server
# Or run using uvx
uvx --from git+https://github.com/jb3cloud/nl2sql-mcp nl2sql-mcp
๐ ๏ธ Makefile Workflow
The repository includes a Makefile that standardizes common tasks. All targets invoke tooling via uv run to ensure a consistent environment.
# Discover available targets and variables
make help
# Run the full Quality Gauntlet (format โ lint โ typecheck โ test)
make quality
# Individual steps (as needed)
make format # Ruff format
make lint # Ruff check --fix
make typecheck # basedpyright (strict)
make test # pytest
make clean # Remove caches and build artifacts
Deployment helpers for Azure Container Apps are also provided:
# Create docker container
make docker
# Run docker container
make docker-run
# Verify Azure CLI login and (optionally) set subscription
make preflight AZ_SUBSCRIPTION="00000000-0000-0000-0000-000000000000"
# Build from source and deploy
make publish \
APP_NAME=my-nl2sql-mcp \
RESOURCE_GROUP=my-rg \
ENVIRONMENT=my-aca-env \
LOCATION=eastus \
AZ_SUBSCRIPTION="00000000-0000-0000-0000-000000000000" \
INGRESS=external \
TARGET_PORT=8000 \
ENV_ARGS="NL2SQL_MCP_DATABASE_URL=postgresql://user:pass@host:5432/db"
Supported variables (can be provided via environment or CLI): APP_NAME, RESOURCE_GROUP, LOCATION, ENVIRONMENT, AZ_SUBSCRIPTION, INGRESS (default external), TARGET_PORT (default 8000).
Environment Configuration
Create a .env file with your database connection:
# Required: Database connection
NL2SQL_MCP_DATABASE_URL=postgresql://user:pass@localhost:5432/dbname
# Optional: Result limits and debugging
NL2SQL_MCP_ROW_LIMIT=1000
NL2SQL_MCP_MAX_CELL_CHARS=500
NL2SQL_MCP_DEBUG_TOOLS=1 # Enable find_tables/find_columns tools
๐๏ธ Architecture
graph TD
A[LLM Client] -->|MCP Protocol| B[FastMCP Server]
subgraph "Core Services"
B --> C[Schema Service]
B --> D[SQLGlot Service]
B --> E[Execution Engine]
end
subgraph "Intelligence Layer"
C --> F[Schema Explorer]
C --> G[Query Engine]
F --> H[Reflection & Profiling]
F --> I[Graph Analysis]
G --> J[Multi-Modal Retrieval]
G --> K[Graph Expansion]
end
subgraph "Data Layer"
E --> L[SQLAlchemy Engine]
H --> L
L --> M[(Your Database)]
end
style A fill:#e1f5fe
style B fill:#f3e5f5
style M fill:#e8f5e8
The system follows a two-phase architecture:
- Schema Building Phase: Comprehensive database analysis, relationship mapping, and semantic understanding
- Query Processing Phase: Real-time natural language to SQL conversion with context-aware planning
๐ MCP Tools API
Core Workflow Tools
get_init_status()
Check server readiness and initialization progress.
// Response
{
"phase": "READY",
"attempts": 1,
"started_at": "2024-01-15T10:30:00Z",
"completed_at": "2024-01-15T10:30:45Z"
}
get_database_overview(req: DatabaseOverviewRequest)
High-level database summary with subject areas.
// Request
{
"include_subject_areas": true,
"area_limit": 8
}
// Response
{
"total_tables": 45,
"total_schemas": 3,
"subject_areas": [
{
"name": "Sales Analytics",
"tables": ["orders", "customers", "products"],
"summary": "Customer orders and product sales data"
}
]
}
plan_query_for_intent(req: PlanQueryRequest)
Intent-first SQL planning with structured guidance.
// Request
{
"request": "Show monthly revenue by region for 2024",
"constraints": {
"time_range": "2024-01-01..2024-12-31",
"metric": "revenue"
},
"detail_level": "standard"
}
// Response
{
"relevant_tables": [
{
"table_key": "sales.orders",
"relevance_score": 0.95,
"why_relevant": "Contains revenue data and date columns"
}
],
"join_plan": [
["sales.orders.customer_id", "customers.id"]
],
"main_table": "sales.orders",
"draft_sql": "SELECT DATE_TRUNC('month', order_date) as month...",
"confidence": 0.92
}
execute_query(req: ExecuteQueryRequest)
Safe SQL execution with validation and results.
// Request
{
"sql": "SELECT region, SUM(amount) as revenue FROM sales.orders WHERE order_date >= '2024-01-01' GROUP BY region"
}
// Response
{
"success": true,
"rows": [
{"region": "North", "revenue": 125000.50},
{"region": "South", "revenue": 98750.25}
],
"row_count": 2,
"columns": [
{"name": "region", "type": "VARCHAR"},
{"name": "revenue", "type": "DECIMAL"}
],
"execution_time_ms": 45,
"next_action": "success"
}
Discovery Tools
get_table_info(req: TableInfoRequest)
Detailed table metadata optimized for SQL generation.
// Request
{
"table_key": "sales.orders",
"include_samples": true,
"column_role_filter": ["key", "date", "metric"]
}
find_tables(req: FindTablesRequest) (Debug Mode)
Fast table discovery by natural language intent.
find_columns(req: FindColumnsRequest) (Debug Mode)
Column search for SELECT and WHERE clause building.
SQL Assistance Tools
sql_validate(sql: str)
Validate SQL syntax and structure.
sql_auto_transpile_for_database(sql: str)
Automatically detect and convert SQL dialects.
sql_optimize_for_database(sql: str)
Optimize SQL for your database engine.
โ๏ธ Configuration
Database Support
| Database | SQLAlchemy Driver | Connection String Example |
|---|---|---|
| PostgreSQL | psycopg2 |
postgresql://user:pass@host:5432/db |
| MySQL | pymysql |
mysql+pymysql://user:pass@host:3306/db |
| SQL Server | pyodbc |
mssql+pyodbc://user:pass@host/db?driver=ODBC+Driver+17 |
| SQLite | Built-in | sqlite:///path/to/database.db |
| Oracle | cx_oracle |
oracle://user:pass@host:1521/service |
| Snowflake | snowflake-sqlalchemy |
snowflake://user:pass@account/db/schema |
Schema Configuration
Control schema analysis behavior:
# Via environment or configuration
NL2SQL_MCP_INCLUDE_SCHEMAS=public,analytics
NL2SQL_MCP_EXCLUDE_SCHEMAS=temp,audit
NL2SQL_MCP_SAMPLE_SIZE=100 # Rows per table for profiling
Performance Tuning
# Memory and processing limits
NL2SQL_MCP_ROW_LIMIT=1000 # Max rows returned
NL2SQL_MCP_MAX_CELL_CHARS=500 # Truncate long text
NL2SQL_MCP_SAMPLE_TIMEOUT=5 # Sampling timeout (seconds)
NL2SQL_MCP_MAX_COLS_FOR_EMBEDDINGS=20 # Column embedding limit
NL2SQL_MCP_REFLECT_TIMEOUT=15 # Reflection timeout per statement (seconds)
NL2SQL_MCP_ENABLE_LIGHTWEIGHT_NER=1 # Toggle NER enrichment during profiling (0 disables)
๐งช Testing and Development
Local Development Setup
# Install development dependencies
uv sync --dev
# Run type checking
uv run basedpyright
# Format and lint code
uv run ruff format .
uv run ruff check --fix .
# Run tests
uv run pytest -v
Testing with Live Database
# Schema intelligence testing
uv run python scripts/test_intelligence_harness.py "show sales by region"
# SQL tools testing
uv run python scripts/test_sqlglot_harness.py "SELECT TOP 10 * FROM customers"
Quality Assurance
- 100% type coverage with basedpyright strict mode
- Comprehensive test suite with pytest
- Zero linting violations with ruff
- Dependency injection for testability
- Pure functions where possible
๐ Advanced Usage
Custom Schema Analysis
from nl2sql_mcp.services import ConfigService, SchemaServiceManager
# Initialize with custom configuration
config = ConfigService()
manager = SchemaServiceManager.get_instance()
# Access schema service after initialization
schema_service = manager.get_schema_service()
result = schema_service.analyze_query_schema(
"Find customers with high lifetime value",
max_tables=10,
expand_strategy="fk_following"
)
Multi-Agent Integration
The MCP server integrates seamlessly with multi-agent frameworks:
# Example with your LLM framework
async def query_database(natural_language_query: str):
# 1. Plan the query
plan_result = await mcp_client.call_tool(
"plan_query_for_intent",
{"request": natural_language_query}
)
# 2. Execute the draft SQL
if plan_result.draft_sql:
execution_result = await mcp_client.call_tool(
"execute_query",
{"sql": plan_result.draft_sql}
)
return execution_result.rows
Code Standards
- Python 3.13+ with strict type checking
- Pydantic models for all data structures
- Pure functions and dependency injection for testability
- Comprehensive docstrings for public APIs
- Zero tolerance for type errors, lint violations, or test failures
๐ข Production Deployment
Environment Variables
# Required
NL2SQL_MCP_DATABASE_URL=postgresql://...
# Optional performance tuning
NL2SQL_MCP_ROW_LIMIT=5000
NL2SQL_MCP_MAX_CELL_CHARS=1000
NL2SQL_MCP_SAMPLE_SIZE=200
Health Monitoring
Monitor server health via the get_init_status() tool:
READY: Server fully operationalSTARTING: Initialization in progressFAILED: Initialization failed, check logs
๐ Related Research
This implementation incorporates state-of-the-art research in text-to-SQL:
- Multi-agent architectures for complex query decomposition
- Schema linking with bidirectional context understanding
- Error correction frameworks with multi-turn refinement
- Semantic understanding via lightweight NER and role classification
For detailed research context, see NL2SQL_RESEARCH.md.
๐ Acknowledgments
- FastMCP for the excellent MCP server framework
- SQLGlot for multi-dialect SQL parsing and transpilation
- SQLAlchemy for robust database abstraction
- The text-to-SQL research community for advancing the field
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.