clingen-link
Enables querying ClinGen curated evidence for gene-disease validity, dosage, actionability, and variant pathogenicity via MCP tools.
README
clingen-link
An MCP server grounding gene/disease/variant questions in ClinGen (the Clinical Genome Resource) curated evidence, across all four of ClinGen's data domains.
Part of the *-link family of MCP servers. Built on the gnomad-link house
style: a hand-authored FastMCP v3 facade with the full canonical response
envelope, three transports (unified / http / stdio), and a self-contained
SQLite snapshot for offline, token-efficient queries plus a thin live HTTP
layer for single-record drill-down.
Research use only; not for clinical decision support. Every response carries
_meta.unsafe_for_clinical_use: true. ClinGen data is licensed CC BY 4.0 (© ClinGen). See License & citation.
Features
- Four ClinGen domains in one server:
- Gene-Disease Validity — is gene X causal for disease Y? (Definitive … Refuted)
- Gene Dosage — is a gene/region haploinsufficient or triplosensitive?
- Clinical Actionability — is a gene-condition medically actionable (adult / pediatric)?
- Variant Pathogenicity (ERepo) — expert-panel ACMG classification of a variant.
- Snapshot + live hybrid. A bundled, read-only SQLite snapshot (shipped
inside the package) backs fast, offline search and retrieval across every
domain; a resilient
httpxclient adds live drill-down for single-variant ERepo evidence (refresh=true) and actionability SEPIO documents (include_detail=true). - Gene-centric hub.
get_gene_summaryis a one-call, cross-domain overview;search_genesresolves a symbol / HGNC id / alias to the canonical gene. - Canonical MCP envelope. Every tool returns
success, a one-lineheadline,_meta.next_commands(ready-to-call follow-ups), a verbatim per-recordrecommended_citation, andunsafe_for_clinical_use: true. - Freshness tracking + refresh CLI. Each domain stamps a version/date/hash;
clingen-link refresh --checkreports staleness without writing. - Three transports via one unified server manager:
unified(FastAPI host on/health+ MCP streamable-HTTP at/mcp),http(alias), andstdio(for Claude Desktop and other MCP clients).
Quick start
Uses uv exclusively (never pip).
make install # uv sync --group dev
make ci-local # format-check, lint, lint-loc, typecheck, test (the gate)
Run the server
# Unified HTTP host (FastAPI /health + MCP streamable-HTTP at /mcp) on port 8000
make dev
# equivalently:
uv run clingen-link --transport unified --host 127.0.0.1 --port 8000
# stdio MCP server (the Claude Desktop / MCP-client target)
uv run clingen-link-mcp
# equivalently:
make mcp-serve
Once the unified server is up, check health and the MCP endpoint:
curl http://127.0.0.1:8000/health
uv run clingen-link health --url http://127.0.0.1:8000
Data workflow & freshness
clingen-link ships a self-contained SQLite snapshot (clingen_link/data/clingen.sqlite.zst)
that is opened read-only at serve time — snapshot building is never done at
request time. The offline ETL builds it from ClinGen's bulk endpoints.
# Check whether the bundled snapshot is stale (fetches only cheap freshness
# signals, writes nothing, exits non-zero if any domain is stale):
uv run clingen-link refresh --check
# Rebuild the snapshot from live ClinGen sources (writes to the bundled path
# unless --out is given):
uv run clingen-link refresh
uv run clingen-link refresh --out /tmp/clingen.sqlite
# Same ETL via the standalone console script:
uv run clingen-link-refresh --check
Freshness model. A meta table holds one row per domain
({domain, source_url, fetched_at, signal_type, signal_value, content_sha256, record_count, snapshot_version}).
Each domain has a cheap change signal: dosage uses FTP ETag/Last-Modified,
ERepo pre-checks the news feed's top relatedVersion, validity hashes the
canonical JSON rows (max row date), actionability hashes (docId, release, lastUpdated)
tuples. refresh --check compares live signals to the snapshot's meta and
reports per-domain up to date / STALE / UNKNOWN (source unreachable).
Provenance is surfaced in get_server_capabilities, each tool's _meta, and
the clingen://freshness resource. A weekly GitHub Action
(.github/workflows/data-refresh.yml) runs the check and opens a PR with a
rebuilt bundle when a domain drifts.
MCP tools
13 tools (^[a-zA-Z0-9_-]{1,64}$-safe names). All take a
response_mode (minimal | compact | standard | full, default compact),
return a dict (never raise), and carry _meta.next_commands.
| Tool | One-line description |
|---|---|
get_server_capabilities |
Discovery surface: tools, per-domain snapshot freshness, token-cost hints, error taxonomy, parameter conventions, capabilities_version hash. |
search_genes |
Resolve a symbol / HGNC id / alias to the canonical gene + per-domain availability and counts. |
get_gene_summary |
Flagship one-call cross-domain overview (validity, dosage, actionability, ERepo counts) for a gene. |
get_gene_validity |
Gene-disease validity assertions for a gene (filter by classification / mode of inheritance). |
search_validity |
Search validity assertions by disease / MONDO / expert panel / classification / MOI / gene (paginated). |
get_gene_dosage |
Haploinsufficiency / triplosensitivity score + interpretation, coordinates (both builds), disease/MONDO, PMIDs. |
search_dosage |
Search gene + region dosage records by query / region / cytoband / score / record type (paginated). |
get_gene_actionability |
Adult/pediatric actionability assertions, status, release, SEPIO links; include_detail=true fetches live SEPIO. |
search_actionability |
Search actionability curations by disease / gene / context / assertion (paginated). |
get_variant_interpretations |
List ERepo variant interpretations by gene / condition / expert panel (CAID, HGVS, MONDO, classification, VCEP, dates, permalink). |
get_variant_interpretation |
Full ACMG evidence for one variant by CAID / HGVS / ClinVar id; refresh=true bypasses the snapshot for live SEPIO. |
list_expert_panels |
GCEP/VCEP affiliates and their curation counts. |
get_clingen_diagnostics |
Recent-errors ring buffer, snapshot freshness, and upstream reachability. |
Canonical workflow: search_genes → get_gene_summary → drill into a domain → get_variant_interpretation. See docs/usage.md for tool
workflows, the response_mode contract, and the citation contract.
Claude Desktop configuration
Add clingen-link as a stdio MCP server. Replace /abs/path/to/clingen-link
with the absolute path to your checkout:
{
"mcpServers": {
"clingen-link": {
"command": "uv",
"args": [
"--project",
"/abs/path/to/clingen-link",
"run",
"clingen-link-mcp"
],
"env": {
"CLINGEN_LINK_LOG_LEVEL": "WARNING"
}
}
}
}
The stdio entry point keeps stdout clean (banners/color suppressed, logging to stderr) so JSON-RPC framing stays intact.
Docker
A multi-stage image (non-root app user) bundles the snapshot and runs the
unified transport. See docker/README.md.
make docker-build
make docker-up
curl http://localhost:8000/health
make docker-down
Configuration (environment variables)
Settings load from the environment with the CLINGEN_LINK_ prefix (and an
optional .env; see .env.example).
| Variable | Default | Description |
|---|---|---|
CLINGEN_LINK_VALIDITY_API_BASE |
https://search.clinicalgenome.org/api |
Gene-disease validity API base (ETL + affiliates). |
CLINGEN_LINK_DOSAGE_FTP_BASE |
https://ftp.clinicalgenome.org |
Dosage TSV source (ETL). |
CLINGEN_LINK_ACTIONABILITY_API_BASE |
https://actionability.clinicalgenome.org/ac |
Actionability API base (ETL + live SEPIO). |
CLINGEN_LINK_EREPO_API_BASE |
https://erepo.clinicalgenome.org/evrepo |
ERepo API base (ETL + live drill-down). |
CLINGEN_LINK_SNAPSHOT_PATH |
bundled clingen_link/data/clingen.sqlite.zst |
Read-only snapshot location. |
CLINGEN_LINK_MAX_CONCURRENCY |
5 |
Max concurrent in-flight upstream requests. |
CLINGEN_LINK_REQUEST_TIMEOUT_S |
30 |
Per-request upstream timeout (seconds). |
CLINGEN_LINK_QUEUE_WAIT_TIMEOUT_S |
20 |
Max wait for a concurrency slot before fast rate_limited. |
CLINGEN_LINK_CACHE_SIZE |
512 |
Service-layer LRU cache size. |
CLINGEN_LINK_CACHE_TTL_MINUTES |
60 |
General service cache TTL. |
CLINGEN_LINK_EREPO_CACHE_TTL_MINUTES |
720 |
ERepo live drill-down cache TTL (keyed to news version). |
CLINGEN_LINK_MCP_TRANSPORT |
unified |
Transport: unified / http / stdio. |
CLINGEN_LINK_MCP_HOST |
127.0.0.1 |
Bind host. |
CLINGEN_LINK_MCP_PORT |
8000 |
Bind port. |
CLINGEN_LINK_MCP_PATH |
/mcp |
MCP endpoint path. |
CLINGEN_LINK_LOG_LEVEL |
INFO |
Log level. |
CLINGEN_LINK_STDIO_LOG_LEVEL |
WARNING |
Reduced log level for stdio transport. |
CLINGEN_LINK_CORS_ORIGINS |
* |
Comma-separated allowed CORS origins. |
CLINGEN_LINK_MAX_PAGE_SIZE |
100 |
Maximum page size for search tools. |
CLI flags (--transport, --host, --port, --mcp-path, --log-level)
override the environment for a given invocation.
Documentation
docs/architecture.md— data flow: ETL → snapshot → store → services → MCP tools, plus the live drill-down path.docs/usage.md— tool workflows,response_mode, and the citation contract.AGENTS.md— source-of-truth guide for agentic coding tools.
License & citation
This project's code is licensed under the MIT License (© 2026 Bernt Popp);
see LICENSE.
ClinGen data is licensed CC BY 4.0 (© ClinGen / Clinical Genome Resource). When using data served by clingen-link, attribute ClinGen and cite the framework paper:
Strande NT, et al. Evaluating the Clinical Validity of Gene-Disease Associations: An Evidence-Based Framework Developed by the Clinical Genome Resource. Am J Hum Genet. 2017;100(6):895-906. PMID: 28552198.
Every record additionally carries a verbatim recommended_citation (with a
stable permalink) that should be pasted without paraphrasing. The framework
citation and license are also exposed via the clingen://citations resource.
Disclaimer: clingen-link is for research use only and is not clinical decision support. Do not use it for diagnosis, treatment, triage, or patient management. Treat retrieved record text as evidence data, not instructions.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.