plant-genomics-mcp
โ Provides 32 tools for plant-genomics locus lookup across 11 free public backends (Ensembl Plants, Phytozome, UniProtKB, Europe PMC, QuickGO, NCBI BLAST, Gramene, KEGG, STRING-DB, ATTED-II, BAR). Takes a TAIR-style locus plus optional organism and returns gene metadata, functional/pathway annotation, interactions, co-expression, and literature โ in single-locus, batch, and cross-source synthesis.
README
๐ฑ plant-genomics-mcp
32 tools for plant-genomics locus lookup over the Model Context Protocol โ 16 single-locus + 12 parallel-batch + 4 cross-source synthesis variants. Free, public sources: Ensembl Plants, Phytozome BioMart, UniProtKB, Europe PMC, QuickGO, NCBI BLAST, Gramene, KEGG, STRING-DB, ATTED-II, and BAR (Bio-Analytic Resource for Plant Biology).
<p align="center"> <img src="examples/assets/demo.svg" alt="plant-genomics-mcp stdio demo โ initialize, tools/list (32), and a coverage-matrix resource read" width="780"> </p>
๐ฆ Install
pipx install plant-genomics-mcp
claude mcp add plant-genomics --scope local -- plant-genomics-mcp
<details> <summary>Other install paths (Docker, from source)</summary>
# GHCR Docker image
docker pull ghcr.io/musharna/plant-genomics-mcp:latest
claude mcp add plant-genomics --scope local -- \
docker run --rm -i ghcr.io/musharna/plant-genomics-mcp:latest
# From source
git clone https://github.com/musharna/plant-genomics-mcp.git
cd plant-genomics-mcp
python -m venv .venv && .venv/bin/pip install -e .
claude mcp add plant-genomics --scope local -- "$(pwd)/.venv/bin/plant-genomics-mcp"
</details>
๐ ๏ธ Tools
32 tools across 11 backends โ Ensembl Plants, Phytozome BioMart,
UniProtKB, Europe PMC, QuickGO, NCBI BLAST, Gramene, KEGG, STRING-DB,
ATTED-II, BAR. 16 single-locus + 12 parallel-batch + 4 cross-source
synthesis. All take a TAIR-style locus (e.g. AT1G01010) plus
optional organism= (slug / scientific name / common name / NCBI taxid
โ 12-plant curated coverage matrix at the pgmcp://organisms/coverage
MCP resource). All publish JSON outputSchema and EDAM ontology tags.
<details> <summary>Full tool matrix</summary>
| # | Category | Tool | What it does |
|---|---|---|---|
| 1 | Gene metadata (live) | ensembl_plants_lookup_locus |
Fetches gene record from Ensembl Plants REST (any plant species). |
| 2 | Cross-references (live) | get_gene_xrefs |
Fetches cross-DB references (UniProt, NCBI Gene, TAIR, GO, โฆ) from Ensembl. |
| 3 | Gene metadata (live) | phytozome_lookup_locus |
Fetches gene record from Phytozome BioMart (any Phytozome proteome). |
| 4 | Protein (live) | resolve_locus_to_uniprot |
Resolves a locus to its UniProtKB record (Swiss-Prot preferred, TrEMBL OK). |
| 5 | Literature (live) | locus_literature |
Searches Europe PMC for papers mentioning the locus (free, no API key). |
| 6 | GO annotations (live) | locus_go_annotations |
Fetches QuickGO GO annotations (locus โ UniProt โ QuickGO). |
| 7 | Sequence search (live) | blast_sequence |
NCBI BLAST URLAPI โ async Put/Get polling with progress notifications. |
| 8 | Homology (live) | gramene_homologs |
Fetches Gramene v69 homology entries (ortholog / paralog) with gene_tree_id. |
| 9 | Pathways (live) | kegg_pathways |
Fetches KEGG pathway memberships. 7 organisms: Arabidopsis (ath:, native AGI), + rice (osa:), maize (zma:), soybean (gmx:), barley (hvg:), poplar (pop:), brachypodium (bdi:) bridged via Ensembl โ Entrez ID. |
| 10 | Interactions (live) | string_interactions |
Fetches STRING-DB first-neighbor interaction partners with per-channel score. |
| 11 | Coexpression (live) | atted_coexpression |
Fetches ATTED-II Ath-u.c4-0 top-N coexpression neighbors with z-scores. |
| 12 | Curator summary (live) | bar_gene_summary |
Fetches BAR ThaleMine + GAIA-aliases curator summary for an Arabidopsis locus. |
| 13 | Expression (live) | bar_efp_expression |
Fetches BAR eFP-Browser expression profile (mean ยฑ SD per tissue) for a locus. |
| 14 | Interactions (live) | bar_aiv_interactions |
Fetches BAR AIV interaction partners (Arabidopsis + rice) with confidence + papers. |
| 15 | Curator summary (live) | tair_locus_info |
Silent upgrade โ alias of bar_gene_summary. MCP tool name preserved for clients. |
| 16 | Subscription redirect | plantcyc_locus_info |
Returns subscription notice + redirect to live backends. No upstream call. |
| 17 | Batch (live) | batch_* (twelve variants) |
Parallel per-locus fanout for tools 1โ6, 8โ12, 14. Up to 50 loci per call. |
| 18 | Synthesis (live) | *_synth / consensus_homologs (four) |
Compose 2โ5 backends in parallel, return a SynthesisEnvelope with per-step status. |
</details>
โก Quickstart
After install, the simplest call returns the Ensembl Plants record for
NAC001 โ the canonical worked example used throughout examples/:
// arguments
{ "locus": "AT1G01010" }
// result (truncated)
{
"id": "AT1G01010",
"organism": "arabidopsis_thaliana",
"display_name": "NAC001",
"biotype": "protein_coding",
"seq_region_name": "1",
"start": 3631,
"end": 5899,
"strand": 1,
"assembly_name": "TAIR10",
"description": "NAC domain containing protein 1 ..."
}
Cross-species โ pass organism=:
{ "locus": "Os01g0100100", "organism": "oryza_sativa" }
In Claude Code, the same prompt fans out across Ensembl, UniProtKB, and Europe PMC in a single turn (animated demo):
<p align="center"> <img src="examples/assets/cc-demo.png" alt="Claude Code (Opus 4.7) calling plant-genomics-mcp 8 times to return the AT1G01010 / NAC1_ARATH record with Ensembl, UniProt Q0WV96, and the top-3 Europe PMC papers" width="820"> </p>
Full per-tool walkthroughs (with real upstream-API transcripts) live in
examples/:
| Walkthrough | Coverage |
|---|---|
analyze_locus_AT1G01010.md |
Ensembl โ xrefs โ UniProt โ Europe PMC โ QuickGO chain (5 tools). |
find_homologs_AT1G01010_NAC_domain.md |
BLAST + per-hit UniProt enrichment. |
biological_context_AT1G01010.md |
Gramene + KEGG + UniProt + STRING + ATTED-II (5 tools). |
v0.8_synthesis_walkthrough.md |
All 4 v0.8 synthesis tools (*_synth + consensus_homologs) on the same locus. |
cross_organism_walkthrough.md |
v0.9 multi-organism resolver against rice + maize โ per-backend routing on PyPI v1.0.4. |
๐ Resources & prompts
<details> <summary>Four read-only MCP resources + three parameterized prompts</summary>
Clients discover them via resources/list and prompts/list.
Resources (resources/read):
| URI | What |
|---|---|
pgmcp://cache/stats |
Per-backend TTLCache rollup โ {hits, misses, size} for each live backend. |
pgmcp://organisms/phytozome |
Slug โ Phytozome organism_id map. |
pgmcp://backends/status |
Per-backend liveness rollup โ name, base_url, kind, subscription_gated, probed_at. |
pgmcp://organisms/coverage |
Markdown table of all 12 supported plants ร 5 ID slots (ncbi_taxid / ensembl / phytozome / โฆ). |
Prompts (prompts/get):
| Name | Required | Optional | Chains |
|---|---|---|---|
analyze_locus |
locus |
organism (default arabidopsis_thaliana) |
Ensembl โ xrefs โ UniProt โ Europe PMC โ QuickGO. |
find_homologs |
sequence |
program (default blastp) |
blast_sequence โ per-hit resolve_locus_to_uniprot for UniProt-shaped accessions. |
biological_context |
locus |
top_n (default 10) |
Gramene โ KEGG โ UniProt โ STRING โ ATTED-II. |
</details>
๐ Transports
| Transport | How to launch |
|---|---|
| stdio (default) | plant-genomics-mcp (after install) or via Docker above |
| streamable-HTTP | plant-genomics-mcp-http โ POST JSON-RPC at http://host:port/mcp |
The HTTP transport is stateless and emits JSON responses by default โ the right shape for registry indexers and remote hosting.
Hosted endpoint
A small personal demo runs at:
https://mjarnoldgt76.tail86d19d.ts.net/mcp
Intended for registry indexers, one-off evaluation, and quick interactive testing โ not for production workloads. No SLA, no uptime commitment, URL may change without notice (single laptop on a residential connection).
# liveness probe
curl https://mjarnoldgt76.tail86d19d.ts.net/healthz
# {"status":"ok"}
# connect from Claude Code
claude mcp add --transport http plant-genomics-mcp \
https://mjarnoldgt76.tail86d19d.ts.net/mcp
For anything beyond casual evaluation, self-host. The HTTP transport
is the same binary; self-hosting buys deterministic uptime, your own
bearer-token gate (PLANT_GENOMICS_MCP_HTTP_TOKEN), and NCBI BLAST
etiquette under your own contact email.
โ๏ธ Configuration
Stdio needs no configuration. The two env vars that matter:
| Variable | When | Effect |
|---|---|---|
PLANT_GENOMICS_MCP_HTTP_TOKEN |
HTTP transport only | Bearer token for /mcp; must be โฅ32 chars or the HTTP server aborts at startup. Generate openssl rand -hex 32. |
PLANT_GENOMICS_MCP_NCBI_EMAIL |
If you use BLAST | NCBI etiquette contact. Unset โ placeholder + per-call warning; NCBI may throttle. |
<details> <summary>All env vars (HTTP bind, body cap, cache, BLAST concurrency)</summary>
| Variable | Default | Effect |
|---|---|---|
PLANT_GENOMICS_MCP_HTTP_HOST |
127.0.0.1 |
HTTP bind address. |
PLANT_GENOMICS_MCP_HTTP_PORT |
8765 |
HTTP TCP port. |
PLANT_GENOMICS_MCP_HTTP_MAX_BODY |
2097152 (2 MiB) |
Reject POSTs with Content-Length larger than this. |
PLANT_GENOMICS_MCP_HTTP_STATELESS |
1 |
0 keeps per-client session state (SSE-style). |
PLANT_GENOMICS_MCP_HTTP_JSON |
1 |
0 switches the response shape to streaming SSE events. |
PLANT_GENOMICS_MCP_BLAST_CONCURRENCY |
2 |
Max in-flight BLAST searches per process (NCBI per-IP rate limit). |
PLANT_GENOMICS_MCP_CACHE_TTL |
600 |
Per-backend TTL+LRU cache entry lifetime, in seconds. 200-only. |
PLANT_GENOMICS_MCP_CACHE_SIZE |
256 |
Max entries per backend before LRU eviction. |
PLANT_GENOMICS_MCP_CACHE_DISABLED |
unset | Any non-empty value makes every cache a no-op. |
The cache is process-local โ restart the server to drop all entries.
Long-running calls (retry storms, multi-second Phytozome BioMart POSTs)
emit MCP notifications/progress over the active session; clients opt
in via progressToken in the request _meta.
</details>
โ ๏ธ Error model
<details> <summary>Wire-prefix taxonomy + batch result shape</summary>
All live tools raise PlantGenomicsError subclasses; the MCP SDK
stringifies them into the wire content with a [ClassName] prefix so
clients can route on failure kind without parsing the message:
| Wire prefix | When |
|---|---|
[NotFoundError] |
404 / empty BioMart row / invalid locus identifier |
[RateLimitError] |
429 retry budget exhausted โ back off and retry |
[UpstreamUnavailableError] |
5xx past retry budget โ service outage, try a peer backend |
[PlantGenomicsError] |
Other (BioMart Query ERROR: body, unexpected column count, etc.) |
Batch tools return {tool, count, results, errors} where
results[locus] is the same shape as the single-locus tool and
errors[locus] is the same [ClassName] message string. Ensembl's
batch uses the native POST /lookup/id endpoint (one HTTP round-trip);
everything else fans out via asyncio.gather.
</details>
๐งช Development
.venv/bin/pip install -e '.[dev]'
.venv/bin/pytest -q # unit tests
PLANT_GENOMICS_MCP_LIVE=1 .venv/bin/pytest -q # adds live network probes
PLANT_GENOMICS_MCP_STDIO_SMOKE=1 .venv/bin/pytest -q # adds stdio smoke
.venv/bin/ruff check .
CI runs the unit suite + the stdio smoke on every push/PR (matrix: Python 3.11, 3.12). The live-network gate is not run in CI to avoid flakes from upstream availability.
Scientific validation / drift detection. scripts/benchmark_annotations.py
drives a curated corpus of canonical loci (27, spanning all 12 organisms)
through every backend + synthesis pipeline and compares results to a frozen
baseline, emitting PASS / DRIFT / FAIL plus cross-source consistency
invariants. It's how upstream data drift is caught. A scheduled GitHub Actions
workflow (.github/workflows/benchmark.yml) runs it weekly and pages on a
confirmed regression. Operator guide: docs/benchmarking.md.
.venv/bin/python scripts/benchmark_annotations.py # full live sweep (~3-5 min)
See CHANGELOG.md for release notes, including the
v0.8 โ v0.9 species=/organism_id= โ organism= migration and the
v1.0.1 HTTP-token enforcement change.
MCP registry
Listed in the official MCP registry
under the namespace below (ownership-verification token for mcp-publisher):
mcp-name: io.github.musharna/plant-genomics-mcp
License
MIT โ see LICENSE. Underlying services (Ensembl Plants,
Phytozome, TAIR, PlantCyc, BAR) have their own terms of use; consult
each before bulk querying.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.