contact-verifier-mcp
MCP server for verifying B2B contact records via email syntax and DNS/MX checks, serving verified data with per-tenant isolation.
README
contact-verifier
You're about to mail 40,000 people and someone hands you a contact list. One address is j.smith@acme-corp.com — a typo'd domain that doesn't resolve. If your verifier shrugs and calls it valid, you don't just lose one email: it bounces, your bounce rate ticks up, and the mailbox providers that score your sending reputation notice. Enough of those and your good mail starts landing in spam. So when the data can't confirm a domain accepts mail, the honest answer isn't to guess — it's to say risky, and let the caller decide.
contact-verifier ingests B2B contact records, checks whether each email is actually mailable, and serves the verified data three ways — a REST API, an MCP server for agents, and a Parquet warehouse export. It's multi-tenant: many customers' contacts live in one store, and the thing it can't get wrong is letting one tenant see another's data.
Portfolio prototype on synthetic data only — the 15 seed contacts and any tenant you create are made up; no real PII in the tree or git history. "Verification" here means email syntax + DNS/MX deliverability, not a paid validation API or live SMTP probing. Defaults to SQLite so it runs end-to-end from a clean clone; point it at Postgres when you want to.
What "verified" means
Two checks, in order, turned into one status and a one-sentence reason a customer can actually read:
- Syntax (
verify/email.py) — a pragmatic, network-free parse (one@, sane local part, dotted domain with a real TLD) that also lowercases/trims to anormalized_emailso dedup works. Stricter than RFC 5322 on purpose; it catches the malformed addresses real lists actually contain. - Deliverability (
verify/dns.py) — a DNS MX lookup: does the domain advertise mail exchangers at all?
The engine (verify/engine.py) collapses those into four statuses:
| status | meaning | confidence |
|---|---|---|
valid |
syntax ok and the domain has MX records | 0.9 |
invalid |
bad syntax, or the domain can't receive mail (no MX / NXDOMAIN) | 0.1 |
risky |
syntax ok, but DNS couldn't confirm deliverability right now | 0.5 |
unknown |
not yet verified | — |
The load-bearing distinction is between definitive and unconfirmed. NXDOMAIN — the domain provably does not exist — is a real, cached negative, returned immediately; retrying a definitive answer just burns time. But a timeout or SERVFAIL is the resolver having a bad moment, not evidence the address is dead. After bounded retries (exponential backoff + jitter) it returns unknown → risky, never a false invalid. A DNS hiccup must not silently condemn a good contact, and that fail-closed branch is unit-tested (tests/test_verify.py) with an injected resolver, clock, and sleep — it fires in CI with no network and no waiting.
verify/dns.py is where the integration craft lives, since the flaky external call is what breaks in production: per-attempt timeout, retry only on transient failures, a client-side rate limit so a bulk run paces itself, and a bounded LRU+TTL cache (the same domains recur all over a contact list).
Tenant isolation, in one place
Every business row hangs off a tenant_id, and the repository (db/repository.py) is the only layer that touches contacts — by construction it has no method that reads or writes one without a tenant_id in the WHERE clause. Handlers resolve their tenant from the API key (auth.py; keys stored as SHA-256 hashes, plaintext shown once) and pass it down; they can't reach past it because the repository never offers a way to. A cross-tenant fetch returns 404, not 403 (api/routes.py), so the API won't even confirm another tenant's record exists — asserted in tests/test_api.py::test_tenant_isolation. One enforcement point is the design: isolation you have to remember in every handler is isolation you'll eventually forget.
The flow
REST / CLI ──ingest──▶ verify (syntax → MX → status+confidence) ──▶ store ──┐
│
SQLite / Postgres, │
tenant-scoped repo ◀──┘
│
┌───────────────────────────────┼──────────────────────┐
▼ ▼ ▼
REST (FastAPI /v1) MCP server (stdio) Parquet export
4 agent tools warehouse/tenant=<id>/
Verification runs inline and is idempotent — already-verified contacts are skipped — and the same run flags duplicates (a later contact sharing a normalized_email points at the earliest via duplicate_of_id). Each run is recorded in verification_runs.
Three serving surfaces, one stored truth:
- REST (
api/routes.py, prefix/v1):POST /contacts,POST /contacts/verify,GET /contacts(paginated, status filter),GET /contacts/{id},GET /stats,POST /export. Every route requires anX-API-Key. OpenAPI at/docs. - MCP (
mcp/server.py): four tools —search_contacts,get_contact,contact_stats,verify_contacts— over stdio for AI agents. The tools take anapi_key(MCP has no headers) that resolves to a tenant exactly as REST auth does, so an agent only ever sees one tenant. Onlyverify_contactsmutates, and it's idempotent. - Parquet export (
export.py): writes a tenant's contacts towarehouse/tenant=<id>/contacts-<timestamp>.parquet— the partitioned, columnar shape a data lake or external stage expects (CSV offered for quick inspection). Rows stream in batches so memory stays flat for large tenants.
Run it (~2 minutes)
pip install -e ".[dev,mcp]" # or: make install
make test # 31 tests, no DB, no network, no keys
# Drive the whole flow from the CLI (SQLite, real DNS):
contact-verifier provision --name "Acme" # prints a one-time API key (cv_...)
contact-verifier seed --key cv_... # load 15 synthetic contacts
contact-verifier verify --key cv_... # syntax + live MX lookup
contact-verifier export --key cv_... --format parquet
contact-verifier serve # REST API on :8000
Or the same over HTTP once serve is up:
curl -s -X POST localhost:8000/v1/contacts/verify -H "X-API-Key: cv_..."
curl -s "localhost:8000/v1/contacts?status=risky" -H "X-API-Key: cv_..."
The MCP server is contact-verifier-mcp (stdio). SQLite is the default; set CV_DATABASE_URL to a Postgres DSN and run alembic upgrade head (make db-up starts one in Docker) to use Postgres. All config is CV_-prefixed env vars with working defaults — see .env.example.
Status
31 tests pass across 5 files on a Python 3.11 / 3.12 / 3.13 CI matrix (ruff + pytest + pip-audit). Synthetic data only; the Parquet export writes a warehouse layout to the local filesystem, not to a live S3/Snowflake stage. The DNS retry/backoff/cache design, the storage and migration model, and what's deliberately out of scope live in ARCHITECTURE.md.
License
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.