production-grade-mcp-agentic-system
A production-grade MCP server designed for multi-tenant, authenticated, and observable AI agent systems, enabling secure tool execution across heterogeneous data sources.
README
<div align="center">
<img src="https://miro.medium.com/v2/resize:fit:4800/1*vPJ1Xag-f3cgOgSA4QTeXQ.png" alt="Production-Grade MCP Server + Agentic System" width="100%"/>
๐๏ธ Production-Grade MCP Server + Agentic System
A reference implementation of an MCP server designed to actually ship
Multi-tenant ยท Authenticated ยท Observable ยท Rate-limited ยท Cached ยท Circuit-broken ยท Governed
๐ Full Step-by-Step Blog Walkthrough
This repository is the companion codebase for a long-form blog post that walks through every single component end to end, with every line of code explained in context. Start there if you want to understand the "why" behind the architecture before reading the code.
๐ Building a Production-Grade MCP Server Architecture with Agentic System โ
</div>
๐ฏ What This Is
Most MCP tutorials end with a @tool decorator that returns "hello world". That is fine for a demo. It is not what ships.
This repository is a reference implementation of an MCP server designed to run in production: multi-tenant, authenticated, observable, rate-limited, cached, circuit-broken, and governed. It exposes a company's heterogeneous data layer (Postgres, Elasticsearch, S3, vector DB) to AI agents as a single, secure tool surface, and ships with a four-agent support copilot (Planner โ Retriever โ Synthesizer โ Critic) that uses it end to end.
The codebase is deliberately organised around twelve components that keep showing up on the 3 AM pager when teams skip them. Each one lives in its own module and can be read, replaced, or extended independently.
๐๏ธ Architecture Overview
<div align="center">
<img src="https://miro.medium.com/v2/resize:fit:4800/1*vPJ1Xag-f3cgOgSA4QTeXQ.png" alt="Full Architecture" width="90%"/>
The complete production-grade system: MCP server dispatch pipeline on the right, four-agent orchestrator on the left, data plane on top, observability on the bottom, identity and governance as crosscutting concerns.
</div>
๐งฉ The 12 Components
| # | Component | Lives in | What it gives you |
|---|---|---|---|
| 1 | ๐ช Transport & Session Layer | server.py |
stdio for local, Streamable HTTP for remote, horizontal-scale-friendly sessions |
| 2 | ๐ Authentication Server | auth/oauth.py |
OAuth 2.1 + PKCE, short-lived JWTs, JWKS validation |
| 3 | โ๏ธ Authorization & Policy Engine | auth/policy.py |
Tool-level RBAC, tenant-scoped ABAC, deny-by-default |
| 4 | ๐ Tool Registry & Discovery | tools/registry.py |
Dynamic toolsets, .well-known capability metadata |
| 5 | โ Input Validation Layer | validation/schemas.py |
Pydantic schemas, enum constraints, agent-adversarial input as default threat model |
| 6 | ๐ง Tool Execution Engine | tools/base.py |
Three-level hierarchy (atomic / composed / workflow) |
| 7 | ๐ Circuit Breaker & Retry | reliability/ |
Closed โ open โ half-open, Adaptive Timeout Budget Allocation |
| 8 | ๐ฆ Rate Limiting & Quotas | ratelimit/limiter.py |
Redis token-bucket (Lua-atomic), per-tenant and per-tool |
| 9 | โก Caching Layer | cache/manager.py |
Two-tier (L1 in-process, L2 Redis), stampede prevention |
| 10 | ๐งฑ Structured Error Framework | errors/framework.py |
Machine-readable errors with retryable and hint fields |
| 11 | ๐ญ Observability Stack | observability/ |
OpenTelemetry traces, Prometheus metrics, audit logs |
| 12 | ๐ก๏ธ Governance & Multi-Tenancy | governance/ |
Tenant isolation, approval gates, outbound HTTP allowlisting |
๐ Diving Deeper, Section by Section
Each diagram below links back to the corresponding section in the blog, where every line of code is walked through in detail.
<table> <tr> <td width="50%" align="center">
๐ฆ Data Persistence Layer
<img src="https://miro.medium.com/v2/resize:fit:4800/1*kT_lhnF50R4aM2iXXahMoA.png" alt="Data Persistence Layer" width="100%"/>
Postgres + Row-Level Security ยท Tenant isolation at the DB layer
</td> <td width="50%" align="center">
๐ช Transport & Session Layer
<img src="https://miro.medium.com/v2/resize:fit:4800/1*7GEV6AlegLbxX-dqJXHUdA.png" alt="Transport Layer" width="100%"/>
Dual transport ยท Stateless session ยท Middleware chain
</td> </tr> <tr> <td width="50%" align="center">
๐ Authentication, Policy & Governance
<img src="https://miro.medium.com/v2/resize:fit:4800/1*m45EPmIT1_5EmKNR4EEpLQ.png" alt="Auth & Policy" width="100%"/>
OAuth 2.1 ยท YAML policies ยท Human-in-the-loop approvals
</td> <td width="50%" align="center">
๐ง Tool Execution Engine
<img src="https://miro.medium.com/v2/resize:fit:4800/1*ak49o0j_5qLbvvM-zkkF_A.png" alt="Tool Execution" width="100%"/>
Three-level hierarchy ยท Atomic ยท Composed ยท Workflow
</td> </tr> <tr> <td width="50%" align="center">
๐ Reliability Layer
<img src="https://miro.medium.com/v2/resize:fit:4800/1*rjIJxzUpMhJ9BGffTczvLA.png" alt="Reliability" width="100%"/>
Circuit breakers ยท Retry with jitter ยท ATBA budget allocator
</td> <td width="50%" align="center">
โก Rate Limiting & Caching
<img src="https://miro.medium.com/v2/resize:fit:4800/1*CvfLYyppMTLyU9UalfHmyA.png" alt="Rate Limit & Cache" width="100%"/>
Redis token bucket ยท Two-tier cache ยท Stampede lock
</td> </tr> <tr> <td width="50%" align="center">
๐ญ Observability Stack
<img src="https://miro.medium.com/v2/resize:fit:4800/1*dMi7KXpUfoMMsFpVTS8Acg.png" alt="Observability" width="100%"/>
OpenTelemetry ยท Prometheus ยท Audit logs ยท One trace ID
</td> <td width="50%" align="center">
๐ค Multi-Agentic Architecture
<img src="https://miro.medium.com/v2/resize:fit:4800/1*rasNhRMj5Ei93-AEQrbBwQ.png" alt="Multi-Agent" width="100%"/>
Four-agent design ยท Planner ยท Retriever ยท Synthesizer ยท Critic
</td> </tr> </table>
<div align="center">
๐ผ The Orchestrator Flow
<img src="https://miro.medium.com/v2/resize:fit:4800/1*7wyopmnCF_mEdxnI8u02uA.png" alt="Orchestrator" width="80%"/>
End-to-end agent orchestration with one bounded revise loop
</div>
๐ Quick Start
Prerequisites
- Docker & Docker Compose
- Python 3.11+ (only for running the CLI locally)
- An Anthropic API key (for the agent layer)
1. Clone and Configure
git clone https://github.com/FareedKhan-dev/production-grade-mcp-agentic-system.git
cd production-grade-mcp-agentic-system
cp .env.example .env
Edit .env and set at minimum:
ANTHROPIC_API_KEYโ for the agent layerATLAS_AUTH_JWKS_URLโ your OAuth 2.1 provider's JWKS endpoint (or leave default for dev)
2. Bring Up the Stack
docker compose up -d
That brings up the full local environment:
| Service | URL | What it is |
|---|---|---|
| ๐๏ธ MCP Server | http://localhost:8080/mcp |
Streamable HTTP endpoint |
| ๐ Discovery | http://localhost:8080/.well-known/mcp-server |
Unauthenticated capability metadata |
| ๐ Metrics | http://localhost:8080/metrics |
Prometheus scrape target |
| โค๏ธ Health | http://localhost:8080/healthz |
Liveness probe |
| ๐ญ Jaeger | http://localhost:16686 |
Distributed tracing UI |
| ๐ Grafana | http://localhost:3000 |
Metrics dashboards (admin / admin) |
| ๐๏ธ MinIO Console | http://localhost:9001 |
S3-compatible storage UI |
3. Run the Support Copilot CLI
pip install -e .
export ATLAS_MCP_URL=http://localhost:8080
export ATLAS_MCP_TOKEN=dev-token
export ATLAS_TENANT=acme
export ANTHROPIC_API_KEY=sk-ant-...
atlas-copilot "Why was the refund on order o_9002 for CUST-1001 delayed?"
You will see the four agents run end-to-end, the final draft printed with [S1][S2] citations, and a full trace summary including token counts, tool calls, and the run_id that ties back to Jaeger.
4. Connect from Claude Desktop / Cursor
Add this to your MCP host config:
{
"mcpServers": {
"production-mcp": {
"type": "http",
"url": "http://localhost:8080/mcp",
"headers": {
"Authorization": "Bearer ${ATLAS_MCP_TOKEN}",
"X-Tenant-Id": "acme"
}
}
}
}
๐ Repository Structure
.
โโโ ๐ README.md
โโโ ๐ณ docker-compose.yml # Full local stack: app + data + observability
โโโ ๐ณ Dockerfile # Two-stage build, non-root runtime
โโโ ๐ LICENSE
โโโ ๐ฆ pyproject.toml # Dependencies, dev tools, CLI entry points
โโโ โ๏ธ .env.example # Every setting documented by component
โ
โโโ ๐ง config/ # Runtime configuration (hot-reloadable)
โ โโโ http_allowlist.yaml # Per-tenant outbound HTTP allowlist
โ โโโ policy.yaml # YAML-driven authorization policies
โ
โโโ ๐ข deploy/ # Deployment sidecar configs
โ โโโ otel/config.yaml # OpenTelemetry Collector pipeline
โ โโโ prometheus/prometheus.yml # Prometheus scrape targets
โ โโโ sql/init.sql # Schema + RLS policies + seed data
โ
โโโ ๐ docs/ # Deep-dive documentation
โ โโโ AGENT_SYSTEM.md # Multi-agent orchestrator internals
โ โโโ ARCHITECTURE.md # The 12 components in detail
โ โโโ DEPLOYMENT.md # K8s, Cloudflare Workers, bare-metal
โ
โโโ ๐ง src/atlas_mcp/ # Main application source
โ โโโ config.py # Centralized typed settings
โ โโโ server.py # โก Component 1: Transport & dispatch
โ โ
โ โโโ ๐ค agents/ # Four-agent support copilot
โ โ โโโ planner.py # Emits retrieval plan JSON
โ โ โโโ retriever.py # Bounded tool-calling loop
โ โ โโโ synthesizer.py # Drafts reply with citations
โ โ โโโ critic.py # Approves or sends one revise
โ โ โโโ orchestrator.py # Wires the four agents together
โ โ โโโ mcp_client.py # Thin JSON-RPC MCP client
โ โ โโโ memory.py # STM (Redis) + LTM (vector)
โ โ โโโ cli.py # atlas-copilot CLI entry point
โ โ
โ โโโ ๐ auth/ # Components 2 + 3
โ โ โโโ oauth.py # JWT + JWKS validation
โ โ โโโ middleware.py # Bearer token extraction
โ โ โโโ policy.py # YAML-driven policy engine
โ โ
โ โโโ ๐ก๏ธ governance/ # Component 12
โ โ โโโ tenant.py # Tenant pinning middleware
โ โ โโโ approval.py # Human-in-the-loop gate
โ โ
โ โโโ ๐ง tools/ # Components 4 + 6
โ โ โโโ registry.py # In-memory tool index + discovery
โ โ โโโ base.py # Tool abstract base + metadata
โ โ โโโ atomic/ # Level 1: one backend each
โ โ โโโ composed/ # Level 2: deterministic chains
โ โ โโโ workflow/ # Level 3: multi-step procedures
โ โ
โ โโโ ๐ reliability/ # Component 7
โ โ โโโ circuit_breaker.py # 3-state machine per tool
โ โ โโโ retry.py # Exponential backoff + jitter
โ โ โโโ atba.py # Adaptive Timeout Budget Allocation
โ โ
โ โโโ ๐ฆ ratelimit/ # Component 8
โ โ โโโ limiter.py # Redis token bucket (Lua-atomic)
โ โ
โ โโโ โก cache/ # Component 9
โ โ โโโ manager.py # L1 + L2 cache with stampede lock
โ โ
โ โโโ ๐งฑ errors/ # Component 10
โ โ โโโ framework.py # Structured Error Recovery (SERF)
โ โ
โ โโโ ๐ญ observability/ # Component 11
โ โ โโโ tracing.py # OpenTelemetry spans
โ โ โโโ metrics.py # Prometheus instruments
โ โ โโโ audit.py # Structured JSONL audit log
โ โ
โ โโโ โ
validation/ # Component 5
โ โโโ schemas.py # Tool call envelope
โ
โโโ ๐งช tests/ # Narrow tests, load-bearing properties
โโโ test_circuit_breaker.py # State machine transitions
โโโ test_errors.py # SERF wire format + retry semantics
โโโ test_policy.py # Deny-beats-allow + default-deny
๐จ Tech Stack
| Layer | Technology |
|---|---|
| Language | Python 3.11+ |
| Web framework | Starlette + Uvicorn |
| MCP SDK | mcp>=1.2.0 |
| Auth | PyJWT + Authlib (OAuth 2.1 resource server) |
| Validation | Pydantic v2 + Pydantic Settings |
| Database | asyncpg (PostgreSQL 16 with RLS) |
| Search | Elasticsearch 8 (async client) |
| Vector DB | Qdrant |
| Object storage | aioboto3 (MinIO / S3) |
| Cache + queues | Redis 7 (redis[hiredis]) |
| Reliability | tenacity (retries) + custom breaker + custom ATBA |
| Tracing | OpenTelemetry SDK + OTLP exporter |
| Metrics | prometheus_client |
| Logging | structlog (JSON) |
| LLM | Anthropic Messages API (Claude) |
๐งช Testing
The test suite is deliberately narrow, covering the three load-bearing safety properties:
pip install -e ".[dev]"
pytest -v
test_circuit_breaker.pyโ state machine transitions, retryable vs deterministic error classificationtest_errors.pyโ SERF wire format, retry semantics, MCP-level error datatest_policy.pyโ default-deny, deny-beats-allow, glob matching, PII condition blocking
๐ฃ๏ธ Production Deployment
For running this in an actual production environment (managed Postgres, real OAuth provider, SIEM integration, Kubernetes), see docs/DEPLOYMENT.md.
Key swaps between local dev and production:
| Local (docker-compose) | Production |
|---|---|
| Dev JWT issuer | WorkOS AuthKit / Auth0 / Keycloak |
| MinIO | AWS S3 / GCS / Azure Blob |
| Local Postgres | AWS RDS / Cloud SQL / Supabase |
| Redis container | Upstash / ElastiCache / MemoryDB |
| Local OTel collector | Datadog / Honeycomb / Grafana Cloud |
| File-based audit log | Splunk / Chronicle / SIEM of choice |
๐ Documentation
- ๐ Blog Walkthrough โ Building a Production-Grade MCP Server (recommended starting point)
- ๐๏ธ
docs/ARCHITECTURE.mdโ The 12 components in depth - ๐ค
docs/AGENT_SYSTEM.mdโ Multi-agent orchestrator internals - ๐ข
docs/DEPLOYMENT.mdโ Production deployment options
๐ License
MIT. See LICENSE.
<div align="center">
โญ If this helped you, please consider starring the repo
Built with โ and a lot of 3 AM debugging
๐ Read the full blog walkthrough ยท ๐ Report an issue ยท ๐ฌ Start a discussion
</div>
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.