datagouv-mcp
An MCP server that connects AI chatbots directly to data.gouv.fr, France’s national open data platform. It allows users to search, explore, and analyze thousands of public datasets simply by asking questions in natural language.
README
data.gouv.fr MCP Server
Model Context Protocol (MCP) server that allows AI chatbots to search, explore, and analyze datasets from data.gouv.fr, the French national Open Data platform, directly through conversation.
What is this?
The data.gouv.fr MCP server is a tool that allows AI chatbots (like Claude, Gemini, or Cursor) to interact with datasets from data.gouv.fr. Instead of manually browsing the website, you can simply ask questions like "Quels jeux de données sont disponibles sur les prix de l'immobilier ?" or "Montre-moi les dernières données de population pour Paris" and get instant answers. This is currently a proof of concept (POC) and is meant to be run locally on your machine for now, until it is put into production later. Since it runs locally, you'll need a few basic tech skills to set it up, but Docker makes the process straightforward.
The server is built using the official Python SDK for MCP servers and clients and uses the Streamable HTTP transport protocol.
1. Run the MCP server
Before starting, clone this repository and browse into it:
git clone git@github.com:datagouv/datagouv-mcp.git
cd datagouv-mcp
Docker is required for the recommended setup. Install it via Docker Desktop or any compatible Docker Engine before continuing.
🐳 With Docker (Recommended)
# With default settings (port 8000, prod environment)
docker compose up -d
# With custom environment variables
MCP_PORT=8007 DATAGOUV_ENV=demo docker compose up -d
# Stop
docker compose down
Environment variables:
MCP_PORT: port for the MCP HTTP server (defaults to8000when unset).DATAGOUV_ENV:prod(default) ordemo. This controls which data.gouv.fr environement it uses the data from (https://www.data.gouv.fr or https://demo.data.gouv.fr). By default the MCP server talks to the production data.gouv.fr. SetDATAGOUV_ENV=demoif you specifically need the demo environment.
Manual Installation
You will need uv to install dependencies and run the server.
- Install dependencies
uv sync
- Prepare the environment file
Copy the example environment file to create your own .env file:
cp .env.example .env
Then optionnaly edit .env and set the variables that matter for your run:
MCP_PORT=8007 # (defaults to 8000 when unset)
DATAGOUV_ENV=prod # Allowed values: demo | prod (defaults to prod when unset)
Load the variables with your preferred method, e.g.:
set -a && source .env && set +a
- Start the HTTP MCP server
uv run main.py
2. Connect your chatbot to the MCP server
The MCP server configuration depends on your client. Use the appropriate configuration format for your client:
Cursor
Cursor supports MCP servers through its settings. To configure the server:
- Open Cursor Settings
- Search for "MCP" or "Model Context Protocol"
- Add a new MCP server with the following configuration:
{
"mcpServers": {
"datagouv": {
"url": "http://127.0.0.1:8000/mcp",
"transport": "http"
}
}
}
Gemini CLI
Add the following to your ~/.gemini/settings.json file:
{
"mcpServers": {
"datagouv": {
"transport": "http",
"httpUrl": "http://127.0.0.1:8000/mcp"
}
}
}
Claude Desktop
Add the following to your Claude Desktop configuration file (typically ~/Library/Application Support/Claude/claude_desktop_config.json on MacOS, or %APPDATA%\Claude\claude_desktop_config.json on Windows):
{
"mcpServers": {
"datagouv": {
"command": "npx",
"args": [
"mcp-remote",
"http://127.0.0.1:8000/mcp"
]
}
}
}
Claude Code
Use the claude mcp command to add the MCP server:
claude mcp add --transport http datagouv http://127.0.0.1:8000/mcp
VS Code
Add the following to your VS Code settings.json:
{
"servers": {
"datagouv": {
"url": "http://127.0.0.1:8000/mcp",
"type": "http"
}
}
}
Windsurf
Add the following to your ~/.codeium/mcp_config.json:
{
"mcpServers": {
"datagouv": {
"command": "npx",
"args": [
"-y",
"mcp-remote",
"http://127.0.0.1:8000/mcp"
]
}
}
}
Note:
- Replace
http://127.0.0.1:8000/mcpwith your actual server URL if running on a different host or port. For production deployments, usehttps://and configure the appropriate hostname. - This MCP server only exposes read-only tools for now, so no API key is required.
🚚 Transport support
This MCP server uses FastMCP and implements the Streamable HTTP transport only.
STDIO and SSE are not supported.
📋 Available Endpoints
Streamable HTTP transport (standards-compliant):
POST /mcp- JSON-RPC messages (client → server)GET /health- Simple JSON health probe ({"status":"ok","timestamp":"..."})
🛠️ Available Tools
The MCP server provides tools to interact with data.gouv.fr datasets:
-
search_datasets- Search for datasets by keywords. Returns datasets with metadata (title, description, organization, tags, resource count).Parameters:
query(required),page(optional, default: 1),page_size(optional, default: 20, max: 100) -
get_dataset_info- Get detailed information about a specific dataset (metadata, organization, tags, dates, license, etc.).Parameters:
dataset_id(required) -
list_dataset_resources- List all resources (files) in a dataset with their metadata (format, size, type, URL).Parameters:
dataset_id(required) -
get_resource_info- Get detailed information about a specific resource (format, size, MIME type, URL, dataset association, Tabular API availability).Parameters:
resource_id(required) -
query_dataset_data- Query data from a dataset via the Tabular API. Finds a dataset, retrieves its resources, and fetches rows to answer questions.Parameters:
question(required),dataset_id(optional),dataset_query(optional),limit_per_resource(optional, default: 100)Note: Either
dataset_idordataset_querymust be provided. Works for CSV/XLS resources within Tabular API size limits (CSV ≤ 100 MB, XLSX ≤ 12.5 MB). -
download_and_parse_resource- Download and parse a resource that is not accessible via Tabular API (files too large, formats not supported, external URLs).Parameters:
resource_id(required),max_rows(optional, default: 1000),max_size_mb(optional, default: 500)Supported formats: CSV, CSV.GZ, JSON, JSONL. Useful for files exceeding Tabular API limits or formats not supported by Tabular API.
-
get_metrics- Get metrics (visits, downloads) for a dataset and/or a resource.Parameters:
dataset_id(optional),resource_id(optional),limit(optional, default: 12, max: 100)Returns monthly statistics including visits and downloads, sorted by month in descending order (most recent first). At least one of
dataset_idorresource_idmust be provided. Note: This tool only works with the production environment (DATAGOUV_ENV=prod). The Metrics API does not have a demo/preprod environment.
🧪 Tests
Automated Tests with pytest
Run the tests with pytest (these cover helper modules; the MCP server wiring is best exercised via the MCP Inspector):
# Run all tests
uv run pytest
# Run with verbose output
uv run pytest -v
# Run specific test file
uv run pytest tests/test_tabular_api.py
# Run with custom resource ID
RESOURCE_ID=3b6b2281-b9d9-4959-ae9d-c2c166dff118 uv run pytest tests/test_tabular_api.py
# Run with prod environment
DATAGOUV_ENV=prod uv run pytest
Interactive Testing with MCP Inspector
Use the official MCP Inspector to interactively test the server tools and resources.
Prerequisites:
- Node.js with
npxavailable
Steps:
- Start the MCP server (see above)
- In another terminal, launch the inspector:
Adjust the URL if you exposed the server on another host/port.npx @modelcontextprotocol/inspector --http-url "http://127.0.0.1:${MCP_PORT}/mcp"
🤝 Contributing
🧹 Code Linting and Formatting
This project follows PEP 8 style guidelines using Ruff for linting and formatting.
Either running these commands manually or installing the pre-commit hook is required before submitting contributions.
# Lint and sort imports, and format code
uv run ruff check --select I --fix && uv run ruff format
🔗 Pre-commit Hooks
This repository uses a pre-commit hook which lint and format code before each commit. Installing the pre-commit hook is required for contributions.
Install pre-commit hooks:
uv run pre-commit install
The pre-commit hook that automatically:
- Check YAML syntax
- Fix end-of-file issues
- Remove trailing whitespace
- Check for large files
- Run Ruff linting and formatting
🏷️ Releases and versioning
The release process uses the tag_version.sh script to create git tags, GitHub releases and update CHANGELOG.md automatically. Package version numbers are automatically derived from git tags using setuptools_scm, so no manual version updates are needed in pyproject.toml.
Prerequisites: GitHub CLI must be installed and authenticated, and you must be on the main branch with a clean working directory.
# Create a new release
./tag_version.sh <version>
# Example
./tag_version.sh 2.5.0
# Dry run to see what would happen
./tag_version.sh 2.5.0 --dry-run
The script automatically:
- Extracts commits since the last tag and formats them for CHANGELOG.md
- Identifies breaking changes (commits with
!:in the subject) - Creates a git tag and pushes it to the remote repository
- Creates a GitHub release with the changelog content
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.
E2B
Using MCP to run code via e2b.
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Neon Database
MCP server for interacting with Neon Management API and databases