LLM Gateway
Centralized gateway for multiple AWS Bedrock models through the Model Context Protocol (MCP), enabling AI agents to interact with 15+ Bedrock models like Nova, Claude, Llama, and Mistral.
README
LLM Gateway (MCP Server)
Centralized gateway for multiple AWS Bedrock models through the Model Context Protocol (MCP)
This project implements an MCP server that allows AI agents and workflows to communicate with 15+ AWS Bedrock foundation models (Nova, Claude, Llama, Mistral, etc.) through a standardized MCP interface.
šÆ Purpose
The LLM Gateway acts as a universal bridge between external AI workflows and Bedrock foundation models, providing:
- Unified access to 15+ Bedrock models (Nova, Claude, Llama, Mistral)
- Each agent chooses its model based on its needs (hardcoded in the agent)
- Intelligent caching of responses to reduce costs and latency
- Detailed metrics for usage, costs, and performance
- Standard MCP interface for universal agent connection
šļø Architecture
āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā
ā Agent A ā ā Agent B ā ā Agent C ā
ā (nova-pro) ā ā (claude) ā ā (llama) ā
āāāāāāāā¬āāāāāāāā āāāāāāāā¬āāāāāāāā āāāāāāāā¬āāāāāāāā
ā² ā² ā²
ā ā ā
ā MCP Protocol (stdio/SSE) ā
āāāāāāāāāāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāāāāāāāāāā
ā¼
āāāāāāāāāāāāāāāāāāāāāā
ā LLM Gateway ā
ā (MCP Server) ā
ā ā
ā āāāāāāāāāāāāāāāā ā
ā ā Router ā ā
ā āāāāāāāāāāāāāāā⤠ā
ā ā Cache ā ā
ā āāāāāāāāāāāāāāā⤠ā
ā ā Metrics ā ā
ā āāāāāāāāāāāāāāāā ā
āāāāāāāāāāā¬āāāāāāāāāāā
ā²
āāāāāāāāāāāāāāāā¼āāāāāāāāāāāāāāā
ā ā ā
ā¼ ā¼ ā¼
āāāāāāāāāāā āāāāāāāāāāā āāāāāāāāāāāā
ā Nova ā ā Claude ā ā Llama ā
ā Models ā ā Models ā ā Models ā
āāāāāāāāāāā āāāāāāāāāāā āāāāāāāāāāāā
AWS Bedrock Foundation Models
š Project Structure
llm-gateway/
āāā src/
ā āāā server.py # Main FastMCP server
ā āāā config.py # Configuration (AWS Bedrock only)
ā ā
ā āāā models/ # š Bedrock models catalog
ā ā āāā bedrock_models.py # 15+ FMs with pricing
ā ā āāā __init__.py
ā ā
ā āāā bedrock/ # š Universal Bedrock client
ā ā āāā bedrock_client.py # Single client for all models
ā ā āāā __init__.py
ā ā
ā āāā mcp/ # MCP Tools
ā ā āāā tools.py # generate, list_models, get_stats
ā ā āāā __init__.py
ā ā
ā āāā core/ # Business logic
ā ā āāā router.py # Model routing
ā ā āāā cache.py # Cache system
ā ā āāā metrics.py # Metrics tracking
ā ā āāā __init__.py
ā ā
ā āāā utils/ # Utilities
ā āāā logger.py # Centralized logging
ā āāā validators.py # Validations
ā āāā __init__.py
ā
āāā ForAgents/ # šÆ LangGraph agent integration
ā āāā bedrock_client.py # MCP client (stdio)
ā āāā llm_node.py # Reusable LangGraph node
ā āāā example_agent.py # 5 complete examples
ā āāā requirements.txt # Agent dependencies
ā āāā README.md # Integration guide
ā
āāā Dockerfile # š³ Docker image for production
āāā docker-compose.yml # Easy deployment with Docker Compose
āāā .dockerignore # Build exclusions
ā
āāā requirements.txt # Gateway dependencies
āāā .env.example # Environment variables template
āāā README.md # This file
š Installation
1. Clone and install dependencies
cd llm-gateway
pip install -r requirements.txt
2. Configure environment variables
Create .env file with your AWS credentials:
# AWS Bedrock (only required configuration)
AWS_ACCESS_KEY_ID=your-access-key
AWS_SECRET_ACCESS_KEY=your-secret-key
AWS_REGION=us-east-1
# Cache and metrics
CACHE_ENABLED=true
CACHE_TTL=3600
CACHE_MAX_SIZE=1000
METRICS_ENABLED=true
# Logging
LOG_LEVEL=INFO
3. Run the server
python -m src.server
š Available Models
The gateway supports 15+ Bedrock models:
Amazon Nova
nova-pro- Advanced, superior reasoning ($0.0008/$0.0032 per 1K)nova-lite- Fast and economical ($0.00006/$0.00024 per 1K)nova-micro- Ultra fast, basic ($0.000035/$0.00014 per 1K)
Anthropic Claude
claude-3-5-sonnet- Smartest ($0.003/$0.015 per 1K)claude-3-5-haiku- Fastest ($0.001/$0.005 per 1K)claude-3-opus- Most powerful ($0.015/$0.075 per 1K)claude-3-sonnet- Balanced ($0.003/$0.015 per 1K)claude-3-haiku- Efficient ($0.00025/$0.00125 per 1K)
Meta Llama
llama-3-3-70b- Latest 70B model ($0.00065/$0.00065 per 1K)llama-3-2-90b- Multimodal with vision ($0.0008/$0.0008 per 1K)llama-3-2-11b- Small multimodal ($0.00016/$0.00016 per 1K)llama-3-1-70b- 70B parameters ($0.00099/$0.00099 per 1K)llama-3-1-8b- Small and efficient ($0.00022/$0.00022 per 1K)
Mistral
mistral-large-2- Flagship, advanced reasoning ($0.003/$0.009 per 1K)mistral-small- Fast and economical ($0.001/$0.003 per 1K)
View complete list: await mcp.call_tool("list_models", {})
š Connecting from Agents
Key Concept
Each agent specifies its model in code, not in configuration. The gateway is a single entry point for all Bedrock models.
MCP Configuration (same for all agents)
{
"mcpServers": {
"llm-gateway": {
"command": "python",
"args": ["-m", "src.server"],
"env": {
"AWS_REGION": "us-east-1",
"AWS_ACCESS_KEY_ID": "your-key",
"AWS_SECRET_ACCESS_KEY": "your-secret"
}
}
}
}
Agent A - Uses Nova Pro
# This agent always uses nova-pro
response = await session.call_tool(
"generate",
{
"model": "nova-pro", # <-- Model hardcoded in agent
"messages": [{"role": "user", "content": "Complex task..."}],
"temperature": 0.7,
"max_tokens": 2000
}
)
Agent B - Uses Claude 3.5 Sonnet
# This agent always uses claude-3-5-sonnet
response = await session.call_tool(
"generate",
{
"model": "claude-3-5-sonnet", # <-- Different model
"messages": [{"role": "user", "content": "Deep analysis..."}]
}
)
Agent C - Uses Llama 3.3 70B
# This agent always uses llama-3-3-70b
response = await session.call_tool(
"generate",
{
"model": "llama-3-3-70b", # <-- Another model
"messages": [{"role": "user", "content": "General task..."}]
}
)
Complete example from an agent (Python)
from mcp import ClientSession
from mcp.client.stdio import stdio_client
async def use_llm_gateway():
# Connect to gateway
async with stdio_client("python", ["-m", "src.server"]) as (read, write):
async with ClientSession(read, write) as session:
# Initialize
await session.initialize()
# List available models
models = await session.call_tool("list_models", {})
print(f"Available: {len(models)} models")
# Generate completion with agent's specific model
response = await session.call_tool(
"generate",
{
"model": "nova-pro", # <-- Agent chooses its model
"messages": [
{"role": "user", "content": "What is AI?"}
],
"temperature": 0.7,
"max_tokens": 500
}
)
print(f"Response: {response['content']}")
print(f"Tokens: {response['usage']['total_tokens']}")
print(f"Cost: ${response['estimated_cost_usd']:.6f}")
print(f"Cached: {response['cached']}")
print(f"Latency: {response['latency_ms']:.2f}ms")
š ļø Available MCP Tools
1. generate
Generates completions using any Bedrock model.
Parameters:
model(str): Short model name (e.g., "nova-pro", "claude-3-5-sonnet", "llama-3-3-70b")messages(list): List of messages with 'role' and 'content'temperature(float): Sampling temperature (0.0-2.0)max_tokens(int): Maximum tokens to generate
Returns:
{
"content": "Generated response...",
"model": "nova-pro",
"model_id": "us.amazon.nova-pro-v1:0",
"usage": {
"input_tokens": 10,
"output_tokens": 50,
"total_tokens": 60
},
"finish_reason": "stop",
"cached": false,
"latency_ms": 1234.56,
"estimated_cost_usd": 0.001234
}
2. list_models
Lists all available Bedrock models with pricing.
Returns:
[
{
"name": "nova-pro",
"model_id": "us.amazon.nova-pro-v1:0",
"description": "Advanced multimodal AI model with superior reasoning",
"context_window": 300000,
"input_cost_per_1k": 0.0008,
"output_cost_per_1k": 0.0032,
"supports_system": true,
"max_tokens": 5000
},
...
]
3. get_stats
Retrieves gateway statistics (metrics and cache).
Returns:
{
"metrics": {
"total_requests": 100,
"total_tokens": 50000,
"total_cost_usd": 1.23,
"cache_hit_rate_percent": 45.5,
"average_latency_ms": 1234.5,
"requests_by_model": {
"nova-pro": 50,
"claude-3-5-sonnet": 30,
"llama-3-3-70b": 20
}
},
"cache": {
"current_size": 50,
"max_size": 1000,
"enabled": true
}
}
š Features
ā Implemented
- ā FastMCP server with standard protocol
- ā 15+ Bedrock models (Nova, Claude, Llama, Mistral)
- ā Universal Bedrock client - single client for all models
- ā Each agent chooses its model - hardcoded in agent code
- ā Cache system with TTL
- ā Detailed metrics per model (requests, tokens, costs, latencies)
- ā Centralized validations
- ā Structured logging
- ā Automatic cost estimation with real pricing
š® Future
- š Rate limiting per agent
- š Metrics persistence (database)
- š Web monitoring dashboard
- š Response streaming
- š Support for image models (multimodal)
š Environment Variables
| Variable | Description | Default |
|---|---|---|
AWS_ACCESS_KEY_ID |
AWS Access Key | - |
AWS_SECRET_ACCESS_KEY |
AWS Secret Key | - |
AWS_REGION |
AWS Region | us-east-1 |
CACHE_ENABLED |
Enable cache | true |
CACHE_TTL |
Cache TTL (seconds) | 3600 |
CACHE_MAX_SIZE |
Maximum cache size | 1000 |
METRICS_ENABLED |
Enable metrics | true |
LOG_LEVEL |
Logging level | INFO |
ā FAQ
Why only Bedrock and not other providers?
This gateway is optimized for enterprise environments where Bedrock offers:
- 15+ models under a single infrastructure
- Enterprise security and compliance
- No per-user rate limits (limits per AWS account)
- Predictable pricing with no hidden charges
How does each agent decide which model to use?
The model is specified in the agent's code, not in configuration:
# Agent A
response = await mcp.call_tool("generate", {"model": "nova-pro", ...})
# Agent B
response = await mcp.call_tool("generate", {"model": "claude-3-5-sonnet", ...})
# Agent C
response = await mcp.call_tool("generate", {"model": "llama-3-3-70b", ...})
Can I have multiple agents connected to the same gateway?
Yes, this is the primary use case. All agents connect to the same MCP gateway, but each specifies its preferred model. The gateway:
- Caches shared responses between agents
- Tracks metrics per model
- Optimizes costs with intelligent caching
How do I add a new Bedrock model?
Edit src/models/bedrock_models.py and add the model to the BEDROCK_MODELS dictionary. Example:
"my-model": BedrockModel(
model_id="aws.my-model-v1:0",
name="My New Model",
description="Description",
context_window=128000,
input_cost_per_1k=0.001,
output_cost_per_1k=0.002,
supports_system=True,
max_tokens=4096
)
š³ Docker Deployment
Local Quick Start
# 1. Configure environment variables
cp .env.example .env
# Edit .env with your AWS credentials
# 2. Build and run with Docker Compose
docker-compose up -d
# 3. View logs
docker-compose logs -f
Manual Build
# Build the image
docker build -t bedrock-gateway:latest .
# Run with environment variables
docker run -d \
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
-e AWS_REGION=us-east-1 \
-e CACHE_ENABLED=true \
bedrock-gateway:latest
Docker Features
- ā Multi-stage build (optimized image ~150MB)
- ā Non-root user (security)
- ā Persistent volumes for logs
- ā Configurable health checks
- ā Resource limits (CPU/memory)
- ā Compatible with Docker Compose and Kubernetes
šÆ LangGraph Agent Integration
Quick Start
The ForAgents/ folder contains everything needed:
# 1. Install agent dependencies
cd ForAgents
pip install -r requirements.txt
# 2. Use the LLM node in your graph
from llm_node import AgentState, llm_consultation_node
from langgraph.graph import StateGraph, END
workflow = StateGraph(AgentState)
workflow.add_node("llm", llm_consultation_node)
workflow.set_entry_point("llm")
workflow.add_edge("llm", END)
app = workflow.compile()
# 3. Execute
result = await app.ainvoke({
"messages": [{"role": "user", "content": "Hello"}],
"model": "nova-pro"
})
print(result["response"])
š Additional Documentation
Connection Examples
From LangChain:
from langchain.llms.base import LLM
from mcp import ClientSession
class BedrockMCPLLM(LLM):
model: str = "nova-pro"
async def _acall(self, prompt: str) -> str:
async with stdio_client("python", ["-m", "src.server"]) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
response = await session.call_tool(
"generate",
{
"model": self.model,
"messages": [{"role": "user", "content": prompt}]
}
)
return response["content"]
# Use in chain
llm = BedrockMCPLLM(model="claude-3-5-sonnet")
result = await llm.ainvoke("Explain quantum computing")
From LlamaIndex:
from llama_index.llms.base import LLM
from mcp import ClientSession
class BedrockMCPLLM(LLM):
def __init__(self, model: str = "nova-pro"):
self.model = model
async def acomplete(self, prompt: str) -> str:
async with stdio_client("python", ["-m", "src.server"]) as (read, write):
async with ClientSession(read, write) as session:
await session.initialize()
response = await session.call_tool(
"generate",
{
"model": self.model,
"messages": [{"role": "user", "content": prompt}]
}
)
return response["content"]
# Use in index
llm = BedrockMCPLLM(model="llama-3-3-70b")
response = await llm.acomplete("Summarize this document")
Monitoring and Debugging
Enable verbose logging:
export LOG_LEVEL=DEBUG
python -m src.server
Check gateway health:
stats = await session.call_tool("get_stats", {})
print(f"Total requests: {stats['metrics']['total_requests']}")
print(f"Cache hit rate: {stats['metrics']['cache_hit_rate_percent']}%")
print(f"Total cost: ${stats['metrics']['total_cost_usd']:.2f}")
Monitor specific model usage:
stats = await session.call_tool("get_stats", {})
for model, count in stats['metrics']['requests_by_model'].items():
print(f"{model}: {count} requests")
š Security Best Practices
- Never commit AWS credentials - Use environment variables or AWS IAM roles
- Use IAM roles when deploying to AWS (EC2, ECS, Lambda)
- Rotate credentials regularly
- Limit Bedrock permissions to only required models
- Use VPC endpoints for Bedrock in production
- Enable CloudTrail for audit logging
Example IAM Policy (Minimal Permissions)
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/amazon.nova-*",
"arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
"arn:aws:bedrock:*::foundation-model/meta.llama-*",
"arn:aws:bedrock:*::foundation-model/mistral.*"
]
}
]
}
š Production Deployment
AWS ECS (Recommended)
# 1. Push image to ECR
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <account>.dkr.ecr.us-east-1.amazonaws.com
docker tag bedrock-gateway:latest <account>.dkr.ecr.us-east-1.amazonaws.com/bedrock-gateway:latest
docker push <account>.dkr.ecr.us-east-1.amazonaws.com/bedrock-gateway:latest
# 2. Create ECS task definition with IAM role
# 3. Deploy to ECS Fargate
Kubernetes
apiVersion: apps/v1
kind: Deployment
metadata:
name: bedrock-gateway
spec:
replicas: 3
selector:
matchLabels:
app: bedrock-gateway
template:
metadata:
labels:
app: bedrock-gateway
spec:
containers:
- name: gateway
image: bedrock-gateway:latest
env:
- name: AWS_REGION
value: "us-east-1"
- name: CACHE_ENABLED
value: "true"
resources:
requests:
memory: "256Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "1000m"
š Performance Benchmarks
Typical latencies on t3.medium instance:
| Model | First Request | Cached Request | Avg Tokens/s |
|---|---|---|---|
| nova-micro | 800ms | 5ms | 120 |
| nova-lite | 1200ms | 5ms | 100 |
| nova-pro | 2000ms | 5ms | 80 |
| claude-3-5-haiku | 900ms | 5ms | 110 |
| claude-3-5-sonnet | 1500ms | 5ms | 90 |
| llama-3-1-8b | 700ms | 5ms | 130 |
| llama-3-3-70b | 1800ms | 5ms | 85 |
Note: Latencies vary based on network, region, and prompt complexity
š¤ Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
š License
This project was developed for the Instrumentation course at PUCP.
š„ Author
Developed by Leon Achata as part of the IoT Holter project - PUCP 2025
š References
- AWS Bedrock Documentation
- Model Context Protocol (MCP)
- FastMCP Documentation
- LangGraph Documentation
- Anthropic Claude Models
- Meta Llama Models
- Mistral AI Models
Need help? Open an issue on GitHub or contact the development team.
Recommended Servers
playwright-mcp
A Model Context Protocol server that enables LLMs to interact with web pages through structured accessibility snapshots without requiring vision models or screenshots.
Magic Component Platform (MCP)
An AI-powered tool that generates modern UI components from natural language descriptions, integrating with popular IDEs to streamline UI development workflow.
Audiense Insights MCP Server
Enables interaction with Audiense Insights accounts via the Model Context Protocol, facilitating the extraction and analysis of marketing insights and audience data including demographics, behavior, and influencer engagement.
VeyraX MCP
Single MCP tool to connect all your favorite tools: Gmail, Calendar and 40 more.
graphlit-mcp-server
The Model Context Protocol (MCP) Server enables integration between MCP clients and the Graphlit service. Ingest anything from Slack to Gmail to podcast feeds, in addition to web crawling, into a Graphlit project - and then retrieve relevant contents from the MCP client.
Kagi MCP Server
An MCP server that integrates Kagi search capabilities with Claude AI, enabling Claude to perform real-time web searches when answering questions that require up-to-date information.
E2B
Using MCP to run code via e2b.
Neon Database
MCP server for interacting with Neon Management API and databases
Exa Search
A Model Context Protocol (MCP) server lets AI assistants like Claude use the Exa AI Search API for web searches. This setup allows AI models to get real-time web information in a safe and controlled way.
Qdrant Server
This repository is an example of how to create a MCP server for Qdrant, a vector search engine.