CKAN MCP Server

CKAN MCP Server

Enables AI assistants and CLI tools to explore and analyze datasets from 600+ global CKAN open-data portals. Provides comprehensive tools for dataset discovery, datastore queries, metadata analysis, and local downloads without writing custom CKAN integrations.

Category
Visit Server

README

CKAN MCP Server (Python)

A Model Context Protocol (MCP) server that exposes the 600 plus[^1][^2] global CKAN open-data portals to AI assistants, CLI tools, and other MCP-aware clients. It bundles curated portal presets, strict Pydantic models, and a batteries-included tool suite so analysts, application developers, and operators can explore public datasets without writing bespoke CKAN integrations.

Persona-centric Guidance

Persona Primary questions Suggested section
Curious evaluators "What can this server do?" "Which CKAN actions are covered?" Potential users
Data analysts / MCP end-users "How do I set it up locally?" "How do I connect to a remote MCP server?" Data analysts
Contributors / maintainers "How is the code organized?" "How do I run tests?" Developers
Platform / infra teams "Can I deploy this to Cloud Run?" Production deployment

Potential Users: What This Server Does

Why it exists

  • Purpose-built CKAN interface: Wraps the CKAN Action & Datastore APIs behind MCP tools that AI agents and CLI clients understand.
  • Consistent insights: Presents dataset summaries, freshness analysis, schemas, and download helpers so exploratory conversations stay grounded in real CKAN metadata.
  • Portal-aware behavior: Curated overrides (transport method, dataset URL templates, helper prompts) keep the experience consistent across CKAN portals that deviate from defaults.

Tool catalog (14 tools)

Category Tool What it returns
Session configuration ckan_api_initialise Selects a portal (country/location + overrides) and stores API keys/session metadata.
ckan_api_availability Lists the configured CKAN portals and reports the current session's selection (when set).
audit_ckan_api Probes GET/POST behavior, datastore aliases, and helper metadata; emits recommended overrides for future sessions.
Dataset retrieval get_package Full CKAN dataset metadata (resources, organization, extras).
list_datasets Paginated package list with optional total counts.
search_datasets Action API package_search wrapper with passthrough Solr parameters.
get_data_categories Organizations and groups for navigation.
Datastore access get_first_datastore_resource_records Pulls preview rows from the first active datastore resource.
get_resource_records Targeted datastore search with filters, sorts, distinct, etc.
download_dataset_locally Metadata-rich archive/download helper with MIME detection, extraction, and how-to snippets.
Analysis find_relevant_datasets Weighted scoring across title/description/tags/org/resource metadata.
analyze_dataset_updates Frequency heuristics plus CKAN update timestamps.
analyze_dataset_structure Schema summaries, record counts, sample fields.
get_dataset_insights Combines discovery, updates, structure, and helper prompts into one rich response.

Architecture at a glance

  • src/ckan_mcp/main.py – MCP server entry point supporting stdio and HTTP (SSE) transports.
  • src/ckan_mcp/ckan_tools.py – Tool implementations, transport probes, download helpers, and archive extraction.
  • src/ckan_mcp/helpers.py – Relevance scoring, update frequency analysis, summary builders.
  • src/ckan_mcp/types.py – Strict Pydantic models with extra="allow" for portal-specific metadata.
  • src/ckan_mcp/config_selection.py & src/ckan_mcp/data/ckan_config_selection.json – Curated CKAN portal catalog and overrides consumed by ckan_api_initialise.
  • tests/ & test_runner.py – Pytest suite plus quick smoke runner mirroring production behaviors.

Tip: start with ckan_api_initialise to choose a portal, then call the analysis tools to see the depth of insights returned.


Data Analysts: Getting Insights Fast

Shared prerequisites

  • CKAN portal URL (or use the curated list during initialization).
  • Python 3.11+ and uv or pip for installing dependencies.
  • curl and the POSIX file command on your PATH (the download_dataset_locally tool shells out to both binaries).
  • An MCP-compatible client (Claude CLI, Gemini CLI, etc.).

Option A – Run the MCP server locally

  1. Clone & create a virtual environment
    git clone https://github.com/<org>/ckan-mcp.git
    cd ckan-mcp
    uv venv venv
    source venv/bin/activate
    
  2. Install runtime dependencies
    uv pip install -e .
    
    (Add ".[dev]" for development tooling and ".[examples]" if you want to run the sample scripts that load .env files.)
  3. Optional defaults – export CKAN env vars if you always talk to the same portal:
    export CKAN_BASE_URL="https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action"
    export CKAN_SITE_URL="https://ckan0.cf.opendata.inter.prod-toronto.ca"
    export CKAN_DATASET_URL_TEMPLATE="https://ckan0.cf.opendata.inter.prod-toronto.ca/dataset/{name}"
    
    These are fallback values; interactive sessions normally rely on ckan_api_initialise to pick a portal.
  4. Launch in stdio mode (best for desktop MCP clients):
    python -m ckan_mcp.main
    
  5. Connect your MCP client – example Claude CLI snippet (see core environment variables for transport overrides):
    {
      "mcpServers": {
        "ckan-mcp": {
          "command": "python",
          "args": ["-m", "ckan_mcp.main"],
          "env": {
            "CKAN_MCP_LOCAL_DATASTORE": "~/dataset-store/",
          }
        }
      }
    }
    
  6. Start a session – ask your assistant to "Initialize a CKAN connection"; it will call ckan_api_initialise and then the discovery tools.

Option B – Use a remote MCP server with a local client

This flow is perfect when someone else operates the MCP server on shared infrastructure and you only need local MCP tooling.

  1. Server operator sets up HTTP transport:
    export CKAN_MCP_MODE=http
    export CKAN_MCP_HOST=0.0.0.0
    export CKAN_MCP_PORT=8000
    python -m ckan_mcp.main
    
    or run docker compose up --build to expose http://localhost:8000/mcp and front it with your preferred reverse proxy.
  2. Expose the /mcp endpoint via HTTPS (Cloud Run, Fly.io, Tailscale, etc.) and share the URL with analysts.
  3. Analyst registers the remote MCP server (Claude CLI example):
    claude mcp add --transport http ckan-mcp https://mcp.example.com/mcp
    claude mcp list
    
    Gemini CLI uses gemini mcp add --transport http ckan-mcp https://mcp.example.com/mcp with the same URL.
  4. Use normally – all CLI/desktop prompts now tunnel through the remote MCP server. The analyst still decides which CKAN portal to inspect via ckan_api_initialise.

Day-to-day usage tips

  • ckan_api_availability lists every CKAN portal packaged with this MCP build and reiterates which portal is currently selected (if any) before issuing expensive searches.
  • find_relevant_datasets quickly surfaces top matches for natural-language prompts; follow up with get_dataset_insights for a detailed brief.
  • download_dataset_locally writes metadata, datastore previews, and shell instructions to ~/.cache/ckan-mcp/... so you can pivot to pandas immediately.

Developers: Extend and Contribute

Repository map

src/ckan_mcp/
├── main.py          # MCP entry point + HTTP transport
├── ckan_tools.py    # Tool implementations & download helpers
├── helpers.py       # Scoring, frequency, and summary helpers
├── types.py         # Pydantic models
├── config_selection.py # Catalog loader & helper utilities
├── data/
│   └── ckan_config_selection.json # Curated CKAN catalog & overrides
└── __init__.py

Supporting files: pyproject.toml (uv/poetry style metadata), tests/, test_runner.py, examples/ for fixtures, and Docker/Make targets for container workflows.

Local development workflow

  1. Activate the virtualenv and install dev dependencies:
    source venv/bin/activate
    uv pip install -e ".[dev]"
    
  2. Run formatters and linters (Black first, then Ruff as required by the project guidelines):
    black src/ tests/
    ruff check src/ tests/ --fix
    
  3. Type checking:
    mypy src/
    
  4. Tests:
    pytest tests/ -v
    python test_runner.py  # lightweight smoke run
    # Live integration tests against the curated CKAN portals
    CKAN_RUN_INTEGRATION_TESTS=1 pytest tests/ -m integration -v
    
    Integration tests talk to the public CKAN portal configured via CKAN_TEST_COUNTRY/CKAN_TEST_LOCATION (defaults to Canada/Toronto) and accept overrides such as CKAN_TEST_BASE_URL, CKAN_TEST_SITE_URL, CKAN_TEST_DATASET_URL_TEMPLATE, or CKAN_TEST_SEARCH_TERMS for custom portals.
  5. GitHub Actions verification (optional, requires the GitHub CLI authenticated against openascot/ckan-mcp-private):
    # trigger the full workflow (lint/unit + integration jobs) for your current branch
    gh workflow run ci.yml --ref "$(git rev-parse --abbrev-ref HEAD)"
    # tail the logs for the most recent run
    gh run watch
    gh run view --log-failed
    
    The workflow only runs when triggered manually; the quality-checks job runs Black/Ruff/mypy, and the dependent pytest-suite job reuses .github/workflows/pytest.yml to execute the standard pytest run plus the integration suite (with CKAN_RUN_INTEGRATION_TESTS=1). Trigger the standalone Pytest workflow directly if you only need the testing jobs.
  6. Docker-based workflow (optional, HTTP transport exposed at http://localhost:8000/mcp):
    make dev                 # foreground dev stack with reload
    make quick-start         # background stack, uses docker-compose.dev.yml
    make dev-tools           # helper container via the tools profile
    make shell               # attach to the running dev app container
    make test-production     # builds docker-compose.yml and curls /mcp
    

Follow the AGENTS.md guidance for naming, docstrings, and how to place fixtures under examples/. Always update or add pytest coverage alongside new tools or helper behaviors.

See CHANGELOG.md for release history and public milestone notes.

Contributing checklist

  • Create a feature branch and keep commits focused.
  • Add or update tests under tests/ mirroring the target module name (e.g., tests/test_ckan_tools.py).
  • Run pytest, black, ruff, and mypy locally (or via the docker helpers) before opening a PR.
  • Document new environment variables or tool behaviors in this README or EVALUATION_GUIDE.md as appropriate.

Production Deployment (Google Cloud Run Example)

Cloud Run pairs nicely with the built-in HTTP transport. The following example assumes you have the Google Cloud CLI configured and Artifact Registry enabled.

  1. Build and push the container (uses the included multi-stage Dockerfile):
    export PROJECT_ID="my-gcp-project"
    gcloud auth configure-docker
    gcloud builds submit --tag gcr.io/$PROJECT_ID/ckan-mcp
    
  2. Deploy to Cloud Run:
    gcloud run deploy ckan-mcp \
      --image gcr.io/$PROJECT_ID/ckan-mcp \
      --region us-central1 \
      --platform managed \
      --allow-unauthenticated \
      --port 8000 \
      --set-env-vars CKAN_MCP_MODE=http,CKAN_MCP_HTTP_PATH=/mcp,CKAN_MCP_HTTP_ALLOW_ORIGINS=* \
      --set-env-vars CKAN_BASE_URL=https://ckan0.cf.opendata.inter.prod-toronto.ca/api/3/action,CKAN_SITE_URL=https://ckan0.cf.opendata.inter.prod-toronto.ca
    
    Adjust env vars for your preferred portal or omit them so analysts always call ckan_api_initialise.
  3. Share the endpoint – Cloud Run will emit a URL such as https://ckan-mcp-12345-uc.a.run.app. Provide the /mcp path to clients (https://ckan-mcp-12345-uc.a.run.app/mcp).
  4. Register with MCP clients – same claude mcp add --transport http ... flow as in the analyst section.
  5. Operational tips:
    • Set CKAN_MCP_HTTP_JSON_RESPONSE=true if your proxy expects JSON instead of SSE.
    • Use Secret Manager to supply CKAN_API_KEY for locked-down portals.
    • Monitor Cloud Run metrics; the server makes outbound HTTPS calls to CKAN only when tools are invoked.

Configuration Reference

Core environment variables

Variable Default Purpose
CKAN_BASE_URL none Optional default Action API base; sessions can override via ckan_api_initialise.
CKAN_SITE_URL none Root site URL used for dataset links.
CKAN_DATASET_URL_TEMPLATE none Overrides dataset page URL format ({name} and {id} supported).
CKAN_API_KEY none API key used when the selected portal requires authentication.
CKAN_MCP_MODE stdio stdio for CLI integrations, http for streamable HTTP transport.
CKAN_MCP_HOST 0.0.0.0 (HTTP mode) Bind host when CKAN_MCP_MODE=http.
CKAN_MCP_PORT 8000 Bind port for HTTP mode.
CKAN_MCP_HTTP_PATH /mcp Mount path for HTTP transport (used both by builtin HTTP server and Cloud Run deployments).
CKAN_MCP_HTTP_ALLOW_ORIGINS * CORS allowlist for HTTP mode.
CKAN_MCP_HTTP_JSON_RESPONSE false Emit JSON responses instead of SSE when true.
CKAN_MCP_HTTP_LOG_LEVEL info Log verbosity for HTTP transport.
CKAN_MCP_LOCAL_DATASTORE ./ (current directory) Local directory path where downloaded datasets are stored. Defaults to
current working directory if not set.

CKAN portal overrides

The curated catalog in src/ckan_mcp/data/ckan_config_selection.json contains entries such as Toronto, NYC, etc. Each location can provide overrides like:

  • action_transport: force GET vs POST for /api/3/action calls.
  • datastore_id_alias: whether datastore_search accepts id instead of resource_id.
  • requires_api_key: block initialization until an API key is supplied.
  • helper_prompt: user-facing reminder echoed in tool responses.
  • Pagination settings (default_search_rows, max_search_rows, default_preview_limit).

Call audit_ckan_api after selecting a portal to get automatically generated override recommendations, helper prompt text, and config snippets that can be pasted back into the catalog or used ad hoc via ckan_api_initialise(overrides={...}).

Sources / References

[^1]: DataShades, CKAN Instances, accessed November 30, 2025, https://datashades.info/. [^2]: commondataio/dataportals-registry, accessed November 30, 2025, https://raw.githubusercontent.com/commondataio/dataportals-registry/refs/heads/main/data/datasets/bysoftware/ckan.jsonl.

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured