mcp-bigquery-evals
A BigQuery MCP server with mandatory cost guardrails that dry-run every query before execution, and a measurable accuracy badge from an eval harness.
README
<div align="center">
mcp-bigquery-evals
The BigQuery MCP server with mandatory cost guardrails and a measurable accuracy number.
uvx mcp-bigquery-evals · works with any MCP-compatible client · v0.1.0
</div>
Why use this over the other BigQuery MCPs
| Most BQ MCPs | mcp-bigquery-evals |
|
|---|---|---|
| Cost guardrails | none | mandatory dry-run before every query, refuses if over cap |
| Quality signal | "trust me" | live accuracy badge, recomputed every release |
| Write operations | usually enabled | disabled by design (read-only) |
| Errors when things break | raw API exceptions | 7 stable error codes an agent can switch on |
| Local dev without GCP | impossible | in-memory sqlite-backed fake ships in the box |
What ships in the box
- 7 read-only MCP tools for warehouse discovery and querying
- Mandatory dry-run cost cap on every
run_query(default 100 MB scanned, about $0.0005 per query) - Result-set-equivalence eval harness (Spider/BIRD methodology) with a live accuracy badge in this README
- Structured BigQuery errors with 7 stable codes (
invalid_sql,table_not_found,permission_denied,unauthenticated,rate_limited,query_timeout,unknown) - Two BigQueryClient implementations:
RealBigQueryClient(production, wrapsgoogle-cloud-bigquery) andFakeBigQueryClient(in-memory, sqlite-backed, for dev and CI without GCP credentials)
Quickstart (5 minutes)
1. Install
uvx mcp-bigquery-evals --help
First run takes about 30s while uv fetches dependencies; subsequent runs are instant from the local cache. Plain pip install mcp-bigquery-evals also works.
2. Authenticate to GCP
gcloud auth application-default login
3. Wire into your MCP client
Open your MCP client's server config (developer settings) and add:
{
"mcpServers": {
"bigquery": {
"command": "uvx",
"args": ["mcp-bigquery-evals", "serve"],
"env": {
"BIGQUERY_PROJECT": "YOUR_GCP_PROJECT_ID_HERE"
}
}
}
}
Restart your client. The MCP indicator should show "bigquery" with 7 tools.
4. Try it
Using the bigquery tool, find the top 5 most-viewed Stack Overflow questions tagged 'python'.
The agent chains list_datasets, list_tables, describe_table, run_query to answer. Every run_query is dry-run-cost-capped before execution.
Detailed setup, troubleshooting, and the alternative pip install path live in docs/mcp_client_setup.md.
The 7 tools
| Tool | Purpose |
|---|---|
list_datasets() |
List all datasets in your GCP project |
list_tables(dataset_id) |
List tables in a dataset |
describe_table(table_id) |
Schema, row count, size |
sample_table(table_id, n=5) |
Up to n sample rows |
search_schema(term) |
Fuzzy-match a term against all column names |
estimate_cost(sql) |
Free dry-run; returns bytes_scanned and estimated USD |
run_query(sql, max_bytes_scanned=100MB) |
Dry-run, refuse if over cap, then execute |
All tools are read-only. There are no write operations in v1 by design. See docs/architecture.md for the design rationale.
Cost guardrails
Every run_query call dry-runs first (free) before execution. If the dry-run estimate exceeds max_bytes_scanned, the call returns a structured error rather than burning bytes:
{
"error": "cost_cap_exceeded",
"would_scan": "1.4 GB",
"cap": "100.0 MB",
"estimated_usd": 0.007,
"hint": "narrow your WHERE clause or pass max_bytes_scanned=1500000000 to override"
}
The agent reads the structured error and self-corrects (narrows the WHERE clause, raises the cap explicitly, picks a different table).
Eval harness
Every release runs a result-set-equivalence eval suite against bigquery-public-data and updates the accuracy badge above. The methodology matches Spider and BIRD academic benchmarks: execute both gold and predicted SQL, compare result sets as multisets of rows (order-independent, with float tolerance, Decimal handling, NULL equality, NaN equality, ARRAY/STRUCT recursion, bool/int distinction).
Run locally:
mcp-bigquery-evals evals run --model <your-model-id>
Full methodology, golden-pairs YAML format, and how to add your own pairs: docs/how_evals_work.md.
Development
git clone https://github.com/Umarfarook1/mcp-bigquery-evals
cd mcp-bigquery-evals
python -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev]"
pytest # unit tests (no GCP needed; ~160 tests)
pytest -m bq # real-BQ integration tests (needs GCP creds)
pytest -m live # end-to-end with real model + real BQ
Contributing
Issues and PRs welcome. Highest-leverage contributions:
- More verified golden NL-to-SQL pairs against
bigquery-public-data - Prompt improvements with before/after eval numbers showing the accuracy badge moved
- Bug reports with minimum reproductions
License
MIT, see LICENSE.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.