MCP Servers

GeneChat MCP Server

A local-first MCP server that annotates whole-genome VCF files and lets you query pharmacogenomics, disease risk, and carrier status through natural language.

README

GeneChat MCP Server

Talk to your genome.

GeneChat is a local-first MCP server that annotates your whole-genome VCF once, then serves it as tools so you can ask questions about pharmacogenomics, disease risk, carrier status, and more — with your raw file never leaving your machine.

Try the live demo →

GeneChat example conversation

Things you can ask:

"I was just prescribed simvastatin — any genetic concerns?"
"I have surgery next month. What should I tell my anesthesiologist?"
"Am I a carrier for anything concerning?"

Privacy Notice: GeneChat reads your VCF locally, but tool responses — containing your genotypes, rsIDs, and clinical interpretations — are sent to your LLM provider (e.g. Anthropic, OpenAI) as part of the conversation. Your raw VCF file is never uploaded, but the LLM does see the specific variants and findings returned by each tool call. See Security Recommendations below.

What It Does

You get your genome sequenced ($250–$900 from providers like Nucleus Genomics or Nebula Genomics). You download the raw VCF file. GeneChat annotates it once with open-source tools, then serves it locally via MCP so you can ask questions like:

"What does my genome say about cardiovascular risk?"
"I play soccer — any genetic injury risk factors?"
"How should I think about my diet based on my genetics?"

The LLM calls GeneChat's tools behind the scenes, gets your specific genotypes and annotations, and interprets the results in context.

How It Works

You ask a question in your LLM → it picks the right GeneChat tool → GeneChat queries your local VCF with pysam → returns your genotype + clinical annotations → the LLM interprets the results for you.

GeneChat reads only local files and makes no network calls at runtime. However, tool responses — containing your genotypes and clinical findings — are sent to your LLM provider as part of the conversation. See Security Recommendations for details on cloud vs local LLM options.

Tools

Tool	Purpose
`list_genomes`	List registered genomes so the LLM can ask which to query
`query_variant`	Look up a single variant by rsID or position
`query_variants`	Batch lookup of multiple rsIDs in a single VCF scan
`query_gene`	List notable variants in a gene with smart filter
`query_genes`	Batch query variants across multiple genes at once
`query_pgx`	Pharmacogenomics lookup by drug or gene (CPIC data)
`query_clinvar`	Find clinically significant variants
`query_gwas`	Search the GWAS Catalog by trait, gene, or variant
`calculate_prs`	Polygenic risk scores (PGS Catalog data)
`genome_summary`	High-level overview of your genome

CLI Commands

Command	Purpose
`genechat init <vcf> [--label] [--gnomad] [--dbsnp] [--gwas] [--fast]`	Full first-time setup: annotate, write config
`genechat add <vcf> [--label]`	Register a VCF file without annotation
`genechat annotate [--clinvar] [--gnomad] [--snpeff] [--dbsnp] [--all] [--stale] [--force] [--fast] [--genome]`	Build or update patch.db (auto-downloads references)
`genechat install [--gwas] [--seeds] [--force]`	Install genome-independent reference databases
`genechat status [--json] [--check-updates]`	Show all registered genomes, annotation state, and caches
`genechat licenses`	Show data source licenses for your installation
`genechat serve` / `genechat`	Start the MCP server

Global flags: --version (print version), --no-color (disable colored output). Color output respects the NO_COLOR environment variable and is automatically disabled when stdout or stderr is not a TTY.

Running genechat with no subcommand in an interactive terminal shows a help summary. When stdin is piped (e.g. from an MCP client), it starts the server — so existing MCP configurations are unaffected.

Shell Completion

Enable tab completion for your shell:

genechat --install-completion

If auto-detection fails (e.g. running via uv run), specify the shell explicitly: genechat --install-completion zsh

This enables completion for subcommands, flags, and --genome labels.

Exit Codes

Code	Meaning
0	Success
1	General/unexpected error
2	Invalid usage (bad arguments)
3	Configuration error (missing config, no VCF registered)
4	VCF error (file not found, invalid, missing index)
5	External tool error (bcftools/snpEff not found)
6	Network error (download failed)
130	Interrupted (Ctrl-C)

Prerequisites

Python 3.11+
A consumer WGS VCF file (from Nucleus Genomics, Nebula, Sequencing.com, etc.)
Disk for reference databases (see table below), ~2 GB for your raw VCF + patch.db

For annotation (used by genechat init and genechat annotate):

# macOS (Homebrew)
brew install bcftools brewsci/bio/snpeff

# Linux (conda)
conda install -c bioconda bcftools snpeff

VCF reading at runtime is handled by pysam, installed automatically via uv sync. No external tools needed at runtime.

Quickstart

Option A: Install from source (recommended for development)

git clone https://github.com/natecostello/genechat-mcp.git
cd genechat-mcp
uv sync

Option B: Install as a tool

uv tool install git+https://github.com/natecostello/genechat-mcp.git
# or: pip install git+https://github.com/natecostello/genechat-mcp.git

Initialize GeneChat

genechat init handles the entire setup in one command — validates your VCF, auto-fixes contig names if needed, downloads references, annotates, builds lookup tables, writes config, and prints the MCP JSON snippet:

# If installed from source:
uv run genechat init /path/to/your/raw.vcf.gz --label personal

# If installed as a tool:
genechat init /path/to/your/raw.vcf.gz --label personal

This will:

Detect and fix bare contig names (e.g. GIAB VCFs use 1, 2 instead of chr1, chr2)
Download ClinVar and SnpEff databases
Build a patch database with functional annotations and clinical significance
Write a config.toml to your OS config directory (~/Library/Application Support/genechat/ on macOS, ~/.config/genechat/ on Linux)
Print the MCP JSON to paste into Claude Desktop or Claude Code

Optional extras (combine any flags in a single init):

# Include gnomAD population frequencies and/or GWAS trait search (~58 MB download)
uv run genechat init /path/to/your/raw.vcf.gz --gnomad --gwas

gnomAD is optional; without it, query_gene falls back to ClinVar-only filtering. GWAS enables query_gwas for trait association lookups. Both can be added after init via genechat annotate --gnomad / genechat install --gwas.

What to expect

Benchmarked on the GIAB NA12878 genome (~3.9M variants) on an 8-vCPU machine:

Mode	Time	Peak disk	Best for
Default (`genechat init`)	~15 min	~3 GB	Quick start (SnpEff + ClinVar only)
Default + `--gnomad`	several hours	~17 GB	Low-disk machines (streams one chromosome at a time)
`--fast --gnomad --dbsnp`	~1h 45m	~213 GB	Fast full annotation (bulk downloads, parallel workers)

After annotation, only ~1 GB persists (VCF + patch.db). Reference databases can be deleted.

Where the time goes (--fast mode): SnpEff ~60 min (single-threaded bottleneck), gnomAD ~34 min (8 workers), dbSNP ~6 min (8 workers), ClinVar ~38s.

Don't have your genome sequenced?

You can explore GeneChat using the GIAB NA12878 benchmark genome — a well-characterized reference sample with ~3.7M variants:

# Download the benchmark VCF (~120 MB)
curl -L -O https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/release/NA12878_HG001/NISTv4.2.1/GRCh38/HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz

# Initialize (auto-fixes contig names, downloads references, annotates)
uv run genechat init HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz --label giab

Then ask questions just like you would with your own genome.

Connect to your LLM

genechat init prints MCP config JSON at the end. GeneChat works with any MCP-compatible client — here are common setups:

Claude Desktop:

Copy the JSON printed by genechat init into your claude_desktop_config.json. The output varies by install method (source checkout uses uv run, tool install uses genechat directly) — paste it as-is.

Claude Code:

# Tool install (genechat on PATH):
claude mcp add genechat -- genechat

# Source install (use the command/args from genechat init output):
claude mcp add genechat -- uv run --directory /path/to/genechat-mcp genechat

Cursor / Windsurf / other MCP clients:

Add GeneChat as a stdio MCP server in your client's settings. The command is genechat (or uv run --directory /path/to/genechat-mcp genechat for source installs). Set the GENECHAT_CONFIG environment variable to your config.toml path.

Any client with an MCP config file (mcp.json, mcp_servers.json, etc.) — for source installs, paste the JSON from genechat init instead:

{
  "mcpServers": {
    "genechat": {
      "command": "genechat",
      "env": { "GENECHAT_CONFIG": "/path/to/genechat/config.toml" }
    }
  }
}

Try the remote demo (no installation needed):

For clients that support remote MCP servers, connect to https://genechat-demo.fly.dev/sse. In Claude Desktop, go to Settings > Connectors (requires Pro/Max/Team/Enterprise), or use the mcp-remote bridge:

{
  "mcpServers": {
    "genechat-demo": {
      "command": "npx",
      "args": ["-y", "mcp-remote", "https://genechat-demo.fly.dev/sse"]
    }
  }
}

Claude Code:

claude mcp add genechat-demo --transport sse https://genechat-demo.fly.dev/sse

Multiple genomes

GeneChat supports named genomes for side-by-side comparison:

# Register a second genome
uv run genechat init /path/to/partner.vcf.gz --label partner

# Check what's registered
uv run genechat status

The LLM can then query both genomes using the genome parameter on any tool and the genome2 parameter on most tools for side-by-side comparison.

Architecture

Your raw VCF is never modified. genechat init downloads reference databases (ClinVar, SnpEff, optionally gnomAD/dbSNP) once, annotates your VCF, and stores the results in a separate SQLite patch database (patch.db). At runtime, the MCP server reads your raw VCF and patch.db locally — no network calls, no external tools. Tool responses flow back to the LLM client via the MCP protocol.

Annotation Pipeline (one-time, handled by `genechat init`)

Tool	What it adds	Install
SnpEff	Functional annotation — gene name, effect type, impact level, protein change	`brew install brewsci/bio/snpeff`
bcftools	Database annotation — transfers ClinVar/gnomAD/dbSNP fields into patch.db	`brew install bcftools`

For incremental updates of individual annotation layers (e.g., updating ClinVar without re-running the full pipeline), see docs/annotation-updates.md.

Reference Databases

Database	What it provides	Size	Flag
ClinVar	Clinical significance, disease/condition name, review status	~100 MB	Default
SnpEff DB	Gene/transcript models for functional impact prediction	~1.6 GB	Default
gnomAD	Population allele frequencies (global + per-population)	~150 GB	`--gnomad`
dbSNP	rsID identifiers for each genomic position	~20 GB	`--dbsnp`
GWAS Catalog	1M+ genome-wide association study findings	~58 MB download, ~300 MB on disk	`--gwas`

Default genechat init downloads ClinVar + SnpEff (~2 GB). Optional annotation layers are enabled with flags (e.g. genechat annotate --gnomad). Genome-independent databases like GWAS are installed separately (genechat install --gwas).

Seed Data Pipeline

Gene coordinates, PGx guidelines, and PRS weights are pre-built from external APIs. The source repo contains the raw TSVs in data/seed/; pip-installed users get the prebuilt SQLite lookup_tables.db directly. Either way, genechat init ensures the database is ready — no manual steps needed.

To refresh seed data from upstream APIs (fetches latest from HGNC, Ensembl, CPIC, and PGS Catalog):

genechat install --seeds

Runtime Dependencies

At runtime, GeneChat uses only local files — no external tools, no network calls. (Tool responses are returned to the LLM client, which forwards them to the LLM provider — see Security Recommendations.)

Library	What it does
pysam	Reads your raw VCF via tabix index
mcp	Implements the MCP server protocol
SQLite (stdlib)	Queries lookup tables for gene coordinates, drug info, PRS weights
pydantic	Validates tool inputs and config
typer	CLI framework (subcommands, flags, shell completion)
platformdirs	OS-standard config and data directories

Data Source Licenses

License obligations depend on which annotation layers you install. The base install includes bundled seed data (PGx, PRS, warning genes) using public-domain, CC0, CC BY 4.0, and HPO-licensed data — citations are appreciated. genechat install --seeds refreshes this data from upstream APIs.

Install path	Sources	License	Key obligation
Default (`genechat init`)	ClinVar, SnpEff, CPIC, HGNC, Ensembl	Public domain / MIT / CC0	None (citation appreciated)
`--gnomad`	gnomAD	ODbL 1.0	Attribution required; share-alike on derivative databases
`--dbsnp`	dbSNP	Public domain	None
`--gwas`	GWAS Catalog	CC0	None
Bundled (PRS)	PGS Catalog	EBI ToU + per-score	Cite catalog paper + individual score publications
Bundled (warnings)	HPO, ACMG SF	Custom / academic	HPO: cite, show version, do not modify; ACMG: cite

Run genechat licenses to see which licenses apply to your specific installation. See docs/licenses.md for full attribution text, citation DOIs, and the gnomAD ODbL produced-works distinction.

Security Recommendations

Data	Where it lives	Transmitted?
Your VCF file	Your machine only	Never
Tool responses (genotypes, rsIDs, findings)	Sent to LLM provider per tool call	Yes
Conversation history	MCP client logs (local)	Depends on client settings

GeneChat makes zero network calls at runtime. However, every tool response — containing your genotypes, rsIDs, and clinical interpretations — is returned to the LLM as part of the conversation.

With a cloud LLM (Claude, ChatGPT, etc.): your raw VCF stays local, but tool responses are sent to the provider's servers. The provider's data policies apply to this content.

With a local/self-hosted model (Ollama, llama.cpp, etc.): everything stays on your machine. If you want maximum privacy, use an MCP client configured to run a local model so tool responses never leave your machine.

Store your VCF on an encrypted volume and chmod 600 your VCF and config files. genechat init sets restrictive permissions on the config automatically. MCP clients may log conversation history locally — be aware of cloud sync on those directories. See docs/security.md for platform-specific encryption instructions (APFS, LUKS).

Privacy summary: No telemetry, no analytics, no data collection. Your raw VCF file never leaves your machine. Tool responses containing your genetic findings are sent to your LLM provider as part of the conversation — use a local LLM if this is a concern.

Development / Testing

uv sync --extra dev
uv run pytest
uv run ruff check . && uv run ruff format --check .

The test VCF (tests/data/test_sample.vcf.gz) is auto-generated by a pytest fixture on first run.

End-to-End Testing with GIAB NA12878

Optional e2e tests against the GIAB NA12878 benchmark genome (~3.7M variants):

# Download and init GIAB (see "Don't have your genome sequenced?" above)
# Include --dbsnp for rsID-based lookups in e2e tests
uv run genechat init HG001_GRCh38_1_22_v4.2.1_benchmark.vcf.gz --label giab --dbsnp

# Run e2e tests (point to the chrfixed VCF if contig rename was applied):
export GENECHAT_GIAB_VCF=./HG001_GRCh38_1_22_v4.2.1_benchmark_chrfixed.vcf.gz
uv run pytest tests/e2e/ -v

# Fast only (skip full-VCF scans):
uv run pytest tests/e2e/ -v -m "not slow"

E2e tests are automatically skipped when GENECHAT_GIAB_VCF is not set.

Troubleshooting

Missing VCF index (.tbi): tabix -p vcf /path/to/your/raw.vcf.gz

Wrong genome build: GeneChat expects GRCh38 with chr prefixed chromosomes. GRCh37/hg19 VCFs need liftover first.

Missing lookup_tables.db: The lookup database ships with the package and is built automatically by genechat init. If somehow missing, rebuild with genechat install --seeds.

pysam installation issues on macOS: xcode-select --install

Verifying LLM Interpretations

GeneChat's tool responses are derived from curated databases (ClinVar, CPIC, gnomAD, PGS Catalog) and your local VCF. Results may be incomplete due to query limits, missing annotations, or database version differences — always verify against primary sources. The LLM interpreting that data can also hallucinate — inventing clinical significance, overstating risk, fabricating study references, or filling in gaps with training-data knowledge that may be outdated or wrong.

Practices that reduce this risk:

Require citations. Ask the LLM to cite specific sources (PMIDs, ClinVar accession numbers, CPIC guideline URLs) for every clinical assertion. If it can't point to a source from the tool response, treat the claim with skepticism.
Spot-check against primary sources. Any assertion about a specific variant can be verified in seconds:
- ClinVar — search by rsID for clinical significance and review status
- PharmGKB — drug-gene interactions and CPIC guidelines
- gnomAD — population frequencies
- OMIM — gene-disease relationships
Cross-model verification. For high-stakes findings, run the same query through a second LLM. Agreement on specific claims (citing the same guideline or variant classification) increases confidence. Disagreement is a signal to check primary sources.
Distinguish tool data from LLM synthesis. GeneChat tool responses are clearly structured (genotype, annotation, clinical significance, population frequency). Anything the LLM adds beyond that — dietary recommendations, risk quantification, drug dosing suggestions — is LLM-generated interpretation, not database fact.
Be wary of confident gap-filling. If GeneChat returns "no ClinVar entry found" for a variant, but the LLM still offers a clinical interpretation, that interpretation came from training data, not from your annotated genome.

Important Disclaimer

GeneChat is an informational tool, not a medical device. It is not a substitute for professional genetic counseling or medical advice. Always discuss genetic findings with a qualified healthcare provider before making health decisions.

License

MIT

Recommended Servers

playwright-mcp

A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.

Official

Featured

TypeScript

Magic Component Platform (MCP)

An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.

Audiense Insights MCP Server

Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.

VeyraX MCP

Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.

Official

Featured

Local

graphlit-mcp-server

The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.

Official

Featured

TypeScript

Kagi MCP Server

An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.

Official

Featured

Python

E2B

Using MCP to run code via e2b.

Official

Featured

Neon Database

MCP server for interacting with Neon Management API and databases

Official

Featured

Exa Search

A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.

Official

Featured

Qdrant Server

This repository is an example of how to create a MCP server for Qdrant, a vector search engine.

Official

Featured

GeneChat MCP Server

README

GeneChat MCP Server

What It Does

How It Works

Tools

CLI Commands

Shell Completion

Exit Codes

Prerequisites

Quickstart

Option A: Install from source (recommended for development)

Option B: Install as a tool

Initialize GeneChat

What to expect

Don't have your genome sequenced?

Connect to your LLM

Multiple genomes

Architecture

Annotation Pipeline (one-time, handled by genechat init)

Reference Databases

Seed Data Pipeline

Runtime Dependencies

Data Source Licenses

Security Recommendations

Development / Testing

End-to-End Testing with GIAB NA12878

Troubleshooting

Verifying LLM Interpretations

Important Disclaimer

License

Recommended Servers

Annotation Pipeline (one-time, handled by `genechat init`)