GeneChat MCP Server

GeneChat MCP Server

A local-first MCP server that annotates whole-genome VCF files and lets you query pharmacogenomics, disease risk, and carrier status through natural language.

Category
Visit Server

README

GeneChat MCP Server

License: MIT Python 3.11+ MCP

Talk to your genome.

GeneChat is a local-first MCP server that annotates your whole-genome VCF once, then serves it as tools so you can ask questions about pharmacogenomics, disease risk, carrier status, and more — with your raw file never leaving your machine.

Try the live demo →

GeneChat example conversation

Things you can ask:

  • "I was just prescribed simvastatin — any genetic concerns?"
  • "I have surgery next month. What should I tell my anesthesiologist?"
  • "Am I a carrier for anything concerning?"

Privacy Notice: GeneChat reads your VCF locally, but tool responses — containing your genotypes, rsIDs, and clinical interpretations — are sent to your LLM provider (e.g. Anthropic, OpenAI) as part of the conversation. Your raw VCF file is never uploaded, but the LLM does see the specific variants and findings returned by each tool call. See Security Recommendations below.

What It Does

You get your genome sequenced ($250–$900 from providers like Nucleus Genomics or Nebula Genomics). You download the raw VCF file. GeneChat annotates it once with open-source tools, then serves it locally via MCP so you can ask questions like:

  • "What does my genome say about cardiovascular risk?"
  • "I play soccer — any genetic injury risk factors?"
  • "How should I think about my diet based on my genetics?"

The LLM calls GeneChat's tools behind the scenes, gets your specific genotypes and annotations, and interprets the results in context.

How It Works

You ask a question in your LLM → it picks the right GeneChat tool → GeneChat queries your local VCF with pysam → returns your genotype + clinical annotations → the LLM interprets the results for you.

GeneChat reads only local files and makes no network calls at runtime. However, tool responses — containing your genotypes and clinical findings — are sent to your LLM provider as part of the conversation. See Security Recommendations for details on cloud vs local LLM options.

Tools

Tool Purpose
list_genomes List registered genomes so the LLM can ask which to query
query_variant Look up a single variant by rsID or position
query_variants Batch lookup of multiple rsIDs in a single VCF scan
query_gene List notable variants in a gene with smart filter
query_genes Batch query variants across multiple genes at once
query_pgx Pharmacogenomics lookup by drug or gene (CPIC data)
query_clinvar Find clinically significant variants
query_gwas Search the GWAS Catalog by trait, gene, or variant
calculate_prs Polygenic risk scores (PGS Catalog data)
genome_summary High-level overview of your genome

CLI Commands

Command Purpose
genechat init <vcf> [--label] [--gnomad] [--dbsnp] [--gwas] [--fast] Full first-time setup: annotate, write config
genechat add <vcf> [--label] Register a VCF file without annotation
genechat annotate [--clinvar] [--gnomad] [--snpeff] [--dbsnp] [--all] [--stale] [--force] [--fast] [--genome] Build or update patch.db (auto-downloads references)
genechat install [--gwas] [--seeds] [--force] Install genome-independent reference databases
genechat status [--json] [--check-updates] Show all registered genomes, annotation state, and caches
genechat licenses Show data source licenses for your installation
genechat serve / genechat Start the MCP server

Global flags: --version (print version), --no-color (disable colored output). Color output respects the NO_COLOR environment variable and is automatically disabled when stdout or stderr is not a TTY.

Running genechat with no subcommand in an interactive terminal shows a help summary. When stdin is piped (e.g. from an MCP client), it starts the server — so existing MCP configurations are unaffected.

Shell Completion

Enable tab completion for your shell:

genechat --install-completion

If auto-detection fails (e.g. running via uv run), specify the shell explicitly: genechat --install-completion zsh

This enables completion for subcommands, flags, and --genome labels.

Exit Codes

Code Meaning
0 Success
1 General/unexpected error
2 Invalid usage (bad arguments)
3 Configuration error (missing config, no VCF registered)
4 VCF error (file not found, invalid, missing index)
5 External tool error (bcftools/snpEff not found)
6 Network error (download failed)
130 Interrupted (Ctrl-C)

Prerequisites

  • Python 3.11+
  • A consumer WGS VCF file (from Nucleus Genomics, Nebula, Sequencing.com, etc.)
  • Disk for reference databases (see table below), ~2 GB for your raw VCF + patch.db

For annotation (used by genechat init and genechat annotate):

# macOS (Homebrew)
brew install bcftools brewsci/bio/snpeff

# Linux (conda)
conda install -c bioconda bcftools snpeff

VCF reading at runtime is handled by pysam, installed automatically via uv sync. No external tools needed at runtime.

Quickstart

Option A: Install from source (recommended for development)

git clone https://github.com/natecostello/genechat-mcp.git
cd genechat-mcp
uv sync

Option B: Install as a tool

uv tool install git+https://github.com/natecostello/genechat-mcp.git
# or: pip install git+https://github.com/natecostello/genechat-mcp.git

Initialize GeneChat

genechat init handles the entire setup in one command — validates your VCF, auto-fixes contig names if needed, downloads references, annotates, builds lookup tables, writes config, and prints the MCP JSON snippet:

# If installed from source:
uv run genechat init /path/to/your/raw.vcf.gz --label personal

# If installed as a tool:
genechat init /path/to/your/raw.vcf.gz --label personal

This will:

  1. Detect and fix bare contig names (e.g. GIAB VCFs use 1, 2 instead of chr1, chr2)
  2. Download ClinVar and SnpEff databases
  3. Build a patch database with functional annotations and clinical significance
  4. Write a config.toml to your OS config directory (~/Library/Application Support/genechat/ on macOS, ~/.config/genechat/ on Linux)
  5. Print the MCP JSON to paste into Claude Desktop or Claude Code

Optional extras (combine any flags in a single init):

# Include gnomAD population frequencies and/or GWAS trait search (~58 MB download)
uv run genechat init /path/to/your/raw.vcf.gz --gnomad --gwas

gnomAD is optional; without it, query_gene falls back to ClinVar-only filtering. GWAS enables query_gwas for trait association lookups. Both can be added after init via genechat annotate --gnomad / genechat install --gwas.

What to expect

Benchmarked on the GIAB NA12878 genome (~3.9M variants) on an 8-vCPU machine:

Mode Time Peak disk Best for
Default (genechat init) ~15 min ~3 GB Quick start (SnpEff + ClinVar only)
Default + --gnomad several hours ~17 GB Low-disk machines (streams one chromosome at a time)
--fast --gnomad --dbsnp ~1h 45m ~213 GB Fast full annotation (bulk downloads, parallel workers)

After annotation, only ~1 GB persists (VCF + patch.db). Reference databases can be deleted.

Where the time goes (--fast mode): SnpEff ~60 min (single-threaded bottleneck), gnomAD ~34 min (8 workers), dbSNP ~6 min (8 workers), ClinVar ~38s.

Don't have your genome sequenced?

You can explore GeneChat using the GIAB NA12878 benchmark genome — a well-characterized reference sample with ~3.7M variants:

# Download the benchmark VCF (~120 MB)
curl -L -O https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv4.2.1/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz

# Initialize (auto-fixes contig names, downloads references, annotates)
uv run genechat init HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz --label giab

Then ask questions just like you would with your own genome.

Connect to your LLM

genechat init prints MCP config JSON at the end. GeneChat works with any MCP-compatible client — here are common setups:

Claude Desktop:

Copy the JSON printed by genechat init into your claude_desktop_config.json. The output varies by install method (source checkout uses uv run, tool install uses genechat directly) — paste it as-is.

Claude Code:

# Tool install (genechat on PATH):
claude mcp add genechat -- genechat

# Source install (use the command/args from genechat init output):
claude mcp add genechat -- uv run --directory /path/to/genechat-mcp genechat

Cursor / Windsurf / other MCP clients:

Add GeneChat as a stdio MCP server in your client's settings. The command is genechat (or uv run --directory /path/to/genechat-mcp genechat for source installs). Set the GENECHAT_CONFIG environment variable to your config.toml path.

Any client with an MCP config file (mcp.json, mcp_servers.json, etc.) — for source installs, paste the JSON from genechat init instead:

{
  "mcpServers": {
    "genechat": {
      "command": "genechat",
      "env": { "GENECHAT_CONFIG": "/path/to/genechat/config.toml" }
    }
  }
}

Try the remote demo (no installation needed):

For clients that support remote MCP servers, connect to https://genechat-demo.fly.dev/sse. In Claude Desktop, go to Settings > Connectors (requires Pro/Max/Team/Enterprise), or use the mcp-remote bridge:

{
  "mcpServers": {
    "genechat-demo": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "https://genechat-demo.fly.dev/sse"]
    }
  }
}

Claude Code:

claude mcp add genechat-demo --transport sse https://genechat-demo.fly.dev/sse

Multiple genomes

GeneChat supports named genomes for side-by-side comparison:

# Register a second genome
uv run genechat init /path/to/partner.vcf.gz --label partner

# Check what's registered
uv run genechat status

The LLM can then query both genomes using the genome parameter on any tool and the genome2 parameter on most tools for side-by-side comparison.

Architecture

Your raw VCF is never modified. genechat init downloads reference databases (ClinVar, SnpEff, optionally gnomAD/dbSNP) once, annotates your VCF, and stores the results in a separate SQLite patch database (patch.db). At runtime, the MCP server reads your raw VCF and patch.db locally — no network calls, no external tools. Tool responses flow back to the LLM client via the MCP protocol.

Annotation Pipeline (one-time, handled by genechat init)

Tool What it adds Install
SnpEff Functional annotation — gene name, effect type, impact level, protein change brew install brewsci/bio/snpeff
bcftools Database annotation — transfers ClinVar/gnomAD/dbSNP fields into patch.db brew install bcftools

For incremental updates of individual annotation layers (e.g., updating ClinVar without re-running the full pipeline), see docs/annotation-updates.md.

Reference Databases

Database What it provides Size Flag
ClinVar Clinical significance, disease/condition name, review status ~100 MB Default
SnpEff DB Gene/transcript models for functional impact prediction ~1.6 GB Default
gnomAD Population allele frequencies (global + per-population) ~150 GB --gnomad
dbSNP rsID identifiers for each genomic position ~20 GB --dbsnp
GWAS Catalog 1M+ genome-wide association study findings ~58 MB download, ~300 MB on disk --gwas

Default genechat init downloads ClinVar + SnpEff (~2 GB). Optional annotation layers are enabled with flags (e.g. genechat annotate --gnomad). Genome-independent databases like GWAS are installed separately (genechat install --gwas).

Seed Data Pipeline

Gene coordinates, PGx guidelines, and PRS weights are pre-built from external APIs. The source repo contains the raw TSVs in data/seed/; pip-installed users get the prebuilt SQLite lookup_tables.db directly. Either way, genechat init ensures the database is ready — no manual steps needed.

To refresh seed data from upstream APIs (fetches latest from HGNC, Ensembl, CPIC, and PGS Catalog):

genechat install --seeds

Runtime Dependencies

At runtime, GeneChat uses only local files — no external tools, no network calls. (Tool responses are returned to the LLM client, which forwards them to the LLM provider — see Security Recommendations.)

Library What it does
pysam Reads your raw VCF via tabix index
mcp Implements the MCP server protocol
SQLite (stdlib) Queries lookup tables for gene coordinates, drug info, PRS weights
pydantic Validates tool inputs and config
typer CLI framework (subcommands, flags, shell completion)
platformdirs OS-standard config and data directories

Data Source Licenses

License obligations depend on which annotation layers you install. The base install includes bundled seed data (PGx, PRS, warning genes) using public-domain, CC0, CC BY 4.0, and HPO-licensed data — citations are appreciated. genechat install --seeds refreshes this data from upstream APIs.

Install path Sources License Key obligation
Default (genechat init) ClinVar, SnpEff, CPIC, HGNC, Ensembl Public domain / MIT / CC0 None (citation appreciated)
--gnomad gnomAD ODbL 1.0 Attribution required; share-alike on derivative databases
--dbsnp dbSNP Public domain None
--gwas GWAS Catalog CC0 None
Bundled (PRS) PGS Catalog EBI ToU + per-score Cite catalog paper + individual score publications
Bundled (warnings) HPO, ACMG SF Custom / academic HPO: cite, show version, do not modify; ACMG: cite

Run genechat licenses to see which licenses apply to your specific installation. See docs/licenses.md for full attribution text, citation DOIs, and the gnomAD ODbL produced-works distinction.

Security Recommendations

Data Where it lives Transmitted?
Your VCF file Your machine only Never
Tool responses (genotypes, rsIDs, findings) Sent to LLM provider per tool call Yes
Conversation history MCP client logs (local) Depends on client settings

GeneChat makes zero network calls at runtime. However, every tool response — containing your genotypes, rsIDs, and clinical interpretations — is returned to the LLM as part of the conversation.

With a cloud LLM (Claude, ChatGPT, etc.): your raw VCF stays local, but tool responses are sent to the provider's servers. The provider's data policies apply to this content.

With a local/self-hosted model (Ollama, llama.cpp, etc.): everything stays on your machine. If you want maximum privacy, use an MCP client configured to run a local model so tool responses never leave your machine.

Store your VCF on an encrypted volume and chmod 600 your VCF and config files. genechat init sets restrictive permissions on the config automatically. MCP clients may log conversation history locally — be aware of cloud sync on those directories. See docs/security.md for platform-specific encryption instructions (APFS, LUKS).

Privacy summary: No telemetry, no analytics, no data collection. Your raw VCF file never leaves your machine. Tool responses containing your genetic findings are sent to your LLM provider as part of the conversation — use a local LLM if this is a concern.

Development / Testing

uv sync --extra dev
uv run pytest
uv run ruff check . && uv run ruff format --check .

The test VCF (tests/data/test_sample.vcf.gz) is auto-generated by a pytest fixture on first run.

End-to-End Testing with GIAB NA12878

Optional e2e tests against the GIAB NA12878 benchmark genome (~3.7M variants):

# Download and init GIAB (see "Don't have your genome sequenced?" above)
# Include --dbsnp for rsID-based lookups in e2e tests
uv run genechat init HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz --label giab --dbsnp

# Run e2e tests (point to the chrfixed VCF if contig rename was applied):
export GENECHAT_GIAB_VCF=./HG001_GRCh38_1_22_v4.2.1_benchmark_chrfixed.vcf.gz
uv run pytest tests/e2e/ -v

# Fast only (skip full-VCF scans):
uv run pytest tests/e2e/ -v -m "not slow"

E2e tests are automatically skipped when GENECHAT_GIAB_VCF is not set.

Troubleshooting

Missing VCF index (.tbi): tabix -p vcf /path/to/your/raw.vcf.gz

Wrong genome build: GeneChat expects GRCh38 with chr prefixed chromosomes. GRCh37/hg19 VCFs need liftover first.

Missing lookup_tables.db: The lookup database ships with the package and is built automatically by genechat init. If somehow missing, rebuild with genechat install --seeds.

pysam installation issues on macOS: xcode-select --install

Verifying LLM Interpretations

GeneChat's tool responses are derived from curated databases (ClinVar, CPIC, gnomAD, PGS Catalog) and your local VCF. Results may be incomplete due to query limits, missing annotations, or database version differences — always verify against primary sources. The LLM interpreting that data can also hallucinate — inventing clinical significance, overstating risk, fabricating study references, or filling in gaps with training-data knowledge that may be outdated or wrong.

Practices that reduce this risk:

  • Require citations. Ask the LLM to cite specific sources (PMIDs, ClinVar accession numbers, CPIC guideline URLs) for every clinical assertion. If it can't point to a source from the tool response, treat the claim with skepticism.

  • Spot-check against primary sources. Any assertion about a specific variant can be verified in seconds:

    • ClinVar — search by rsID for clinical significance and review status
    • PharmGKB — drug-gene interactions and CPIC guidelines
    • gnomAD — population frequencies
    • OMIM — gene-disease relationships
  • Cross-model verification. For high-stakes findings, run the same query through a second LLM. Agreement on specific claims (citing the same guideline or variant classification) increases confidence. Disagreement is a signal to check primary sources.

  • Distinguish tool data from LLM synthesis. GeneChat tool responses are clearly structured (genotype, annotation, clinical significance, population frequency). Anything the LLM adds beyond that — dietary recommendations, risk quantification, drug dosing suggestions — is LLM-generated interpretation, not database fact.

  • Be wary of confident gap-filling. If GeneChat returns "no ClinVar entry found" for a variant, but the LLM still offers a clinical interpretation, that interpretation came from training data, not from your annotated genome.

Important Disclaimer

GeneChat is an informational tool, not a medical device. It is not a substitute for professional genetic counseling or medical advice. Always discuss genetic findings with a qualified healthcare provider before making health decisions.

License

MIT

Recommended Servers

playwright-mcp

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official
Featured
TypeScript
Magic Component Platform (MCP)

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Official
Featured
Local
TypeScript
Audiense Insights MCP Server

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

Official
Featured
Local
TypeScript
VeyraX MCP

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official
Featured
Local
graphlit-mcp-server

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official
Featured
TypeScript
Kagi MCP Server

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official
Featured
Python
E2B

E2B

Using MCP to run code via e2b.

Official
Featured
Neon Database

Neon Database

MCP server for interacting with Neon Management API and databases

Official
Featured
Exa Search

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official
Featured
Qdrant Server

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official
Featured